Field notes from building a memory that lasts.
memkeeper is a persistent memory layer for AI agents. These are notes on how it works and why it is built the way it is: retrieval, measurement, local-first design, and the tools around it.
memkeeper on LongMemEval, where each question carries about fifty sessions of history: competitive answer accuracy, local and zero marginal cost, and the full abstention number, 0.833, the category most leaderboards leave out.
A receipt for your AI coding plan: about 41x list-price-equivalent leverage on a $100 subscription, and the surprise that 96% of the tokens are the model re-reading context it has already seen.
memkeeper matches the published answer-accuracy of commercial agent-memory systems on LoCoMo, running locally at zero marginal cost, and reports its abstention behavior, including the adversarial category most leaderboards leave out.
Point your memory layer at your docs folder and it quietly gets worse. Why memkeeper keeps curated memories and ingested documents in separate tiers, with a one-way bridge between them.
Most memory systems assert that they recall well. We ran memkeeper against a long-term conversational QA benchmark and published the numbers, the method, and the script to reproduce them.
Vector search is necessary but not sufficient for agent memory. Here is why memkeeper pairs dense embeddings with BM25 and a cross-encoder reranker, and what each stage covers that the others miss.
Your agent's memory is your context: decisions, preferences, facts about your systems. The default in this space is a hosted vector database. We think the default should be your own machine.
We hand coding agents the shell, the filesystem, and the network. Warden is a deny-by-default gate that decides what an agent is actually allowed to do. We logged 27,860 real decisions, listed the evasions it provably stops, and wrote down where its edges are.