PositioningDesignJune 23, 2026

Local-first memory for AI agents

Think about what an agent's memory actually contains. Not transcripts, the durable stuff: the decisions you have made, the way you like things done, facts about your systems, your projects, your people. Your rollback policy. Who owns billing. That your staging environment has mirrored prod since the March migration.

That is not incidental data. It is a compressed model of how you work. And the default architecture for storing it, today, is to send it to someone else's server.

The hosted default, and what it costs

Most memory products are a hosted vector database with an API in front. It is a reasonable starting point and a fine business model. But for memory specifically, that default quietly trades away four things:

Residency. Your memories, the model of how you work, live on infrastructure you do not control, subject to a retention policy you did not write.
Latency. Every recall is a network round-trip. Inside a prompt loop, that tax is paid on every query, forever.
Independence. Your accumulated memory becomes the lock-in. The longer you use it, the more expensive it is to leave.
A meter. A recurring bill that scales with how much you remember. The product's incentive and yours diverge.

memkeeper is local-first

memkeeper inverts the default. The store is a SQLite database on your disk. The embedding model and the cross-encoder reranker run locally through ONNX. Retrieval, the whole hybrid pipeline, happens on your machine. You reach it over MCP or the CLI. Nothing about a normal recall requires a network.

That is not a privacy feature bolted on. It is the architecture. Your memories stay yours because there is nowhere else for them to go by default.

But does local scale? For agent memory, the volumes are modest by design. memkeeper stores durable facts, decisions, and preferences, not raw transcripts, so a useful store is thousands of memories, not billions of vectors. Local hardware handles that comfortably: warm semantic search runs in about 25 ms. You do not need a cluster to remember what you decided last week.

Local-first, not local-only

Local-first is a default, not a cage. There are real reasons a team might want a shared, hosted tier someday, and that option can exist. The argument here is narrower and, we think, harder to dispute: the default door should be your own machine. Hosting should be the deliberate upgrade you opt into, not the only way in.

There is a smaller design value that follows from taking this seriously. When the semantic path is supposed to be on, memkeeper can be told to fail loudly rather than silently fall back to keyword-only retrieval. A memory system that quietly degrades is worse than one that tells you it is degraded. Local-first makes that honesty cheap, because the machine that should be doing the work is the one in front of you.

Your agents are going to accumulate a model of how you work whether you plan for it or not. The only real question is whose disk it lives on. We think the answer should be yours.

How the local retrieval works: hybrid retrieval beats pure vector search.