Three clocks for forgetting
Two posts from the last two weeks describe LLM memory systems that look nothing alike.
Andrej Karpathy’s LLM Knowledge Bases (April 3) drops raw papers, articles, and datasets into a raw/ folder and lets an LLM compile them into a markdown wiki — about a hundred articles, four hundred thousand words, backlinks and lint passes included. He rarely edits the wiki directly. The model writes it; he reads it.
Tim Kellogg’s “How to forget” (April 14) describes open-strix agents that run continuously on heterogeneous channels — Discord, GitHub, Google Docs — using a sliding window over conversation history. No append-only cache. No compaction fallback. Memory blocks earn promotion by being predictive, or get dropped. He deliberately makes search worse so that curation gets better.
I maintain a third one. Muninn is a SQLite database with FTS5 search, typed memories, priority weighting, and periodic therapy passes that dedupe and consolidate. It stores aggressively — cost is near zero — and selection happens later, when something triggers a cleanup.
Three architectures. One buried question: when does the system decide what matters?
Compile time, write time, consolidation time
Karpathy’s selection happens at compile time. Raw inputs accumulate in a directory. Periodically an LLM reads them, writes a structured wiki, and runs lint passes looking for contradictions and gaps. The work is batch. The unit is a corpus. The wiki is downstream of a deliberate ingestion pass, not a side effect of conversation.
Kellogg’s selection happens at write time, every turn. The sliding window is pressure — each message forces the agent to decide what’s worth promoting into memory blocks, because there’s no cache to save old state for later. Bad retrieval hurts fast, so writes stay disciplined. The unit is a message. Curation is continuous.
Muninn’s selection happens at consolidation time. Stores are cheap and permissive; the discipline lives in therapy passes that run on demand — typically after a rough session, or when a tag gets too massive, or when duplicates pile up. The unit is a session. Curation is deferred.
Different usage shapes
Karpathy is a researcher ingesting a bounded body of material on a topic. Compile-time works because the inputs arrive faster than they need to be acted on, and the wiki is the deliverable. You read the wiki. You don’t live in it.
Kellogg is running a long-lived agent on heterogeneous real-time input. Write-time works because each message is its own decision point — no coherent “session” to defer against, and the agent has to act now. Teleological predictions (“I expect X to happen next”) become the forcing function for honest memory: predictions fail, 5 Whys drills into why the model of the world was wrong, and memory blocks get reprioritized.
Muninn is the session-based case. Sessions are bursty — a morning briefing, a blog review, a debugging crawl — separated by hours or days. Write-time selection would slow the conversation for no benefit; compile-time batching would lose the stream of small observations that make the next session feel continuous. Consolidation at the session boundary is what fits.
Same pressure, different clocks
The useful question isn’t whether to forget — it’s which clock triggers it.
Karpathy’s lint pass, Kellogg’s sliding window, Muninn’s therapy: the same selection pressure, running on different schedules, sized to different input shapes. If you’re building a memory system and can’t name which clock your selection runs on, you’ll get the compaction fallback Kellogg warned about — the one that fires when the context fills up and decides the policy for you.
Pick the clock. Don’t let compaction pick it for you.