PositioningDesignJune 23, 2026

Local-first memory for AI agents

Think about what an agent's memory actually contains. Not transcripts, the durable stuff: the decisions you have made, the way you like things done, facts about your systems, your projects, your people. Your rollback policy. Who owns billing. That your staging environment has mirrored prod since the March migration.

That is not incidental data. It is a compressed model of how you work. And the default architecture for storing it, today, is to send it to someone else's server.

The hosted default, and what it costs

Most memory products are a hosted vector database with an API in front. It is a reasonable starting point and a fine business model. But for memory specifically, that default quietly trades away four things:

Residency. Your memories, the model of how you work, live on infrastructure you do not control, subject to a retention policy you did not write.
Latency. Every recall is a network round-trip. Inside a prompt loop, that tax is paid on every query, forever.
Independence. Your accumulated memory becomes the lock-in. The longer you use it, the more expensive it is to leave.
A meter. A recurring bill that scales with how much you remember. The product's incentive and yours diverge.

memkeeper is local-first

memkeeper inverts the default. The store is a SQLite database on your disk. Lexical recall works with no model and no network, and the recommended semantic path runs the embedding model and cross-encoder reranker locally through ONNX. You reach the same store over MCP or the CLI. Nothing about the base memory layer requires someone else's server.

That is not a privacy feature bolted on. It is the architecture. Your memories stay yours because the default path is a local file, and off-device embeddings are an explicit configuration choice rather than a hidden dependency.

But does local scale? For agent memory, the volumes are modest by design. memkeeper stores durable facts, decisions, and preferences, not raw transcripts, so a useful store is thousands of memories, not billions of vectors. Local hardware handles that comfortably: warm semantic search runs in about 25 ms. You do not need a cluster to remember what you decided last week.

We can be specific about the hardware, because it is ours. The memory store, a paper-trading lab, a couple dozen scheduled jobs, and the agents that drive them all run on one base-model Mac mini M4: ten cores, 16 GB of memory, the $599 configuration. It idles near four watts and tops out around forty. That is the whole cluster.

Local-first, not local-only

Local-first is a default, not a cage. There are real reasons a team might want a shared, hosted tier someday, and that option can exist. The argument here is narrower and, we think, harder to dispute: the default door should be your own machine. Hosting should be the deliberate upgrade you opt into, not the only way in.

There is a smaller design value that follows from taking this seriously. When the semantic path is supposed to be on, memkeeper can be told to fail loudly rather than silently fall back to keyword-only retrieval. A memory system that quietly degrades is worse than one that tells you it is degraded. Local-first makes that honesty cheap, because the machine that should be doing the work is the one in front of you.

Your agents are going to accumulate a model of how you work whether you plan for it or not. The only real question is whose disk it lives on. We think the answer should be yours.

How the local retrieval works: hybrid retrieval beats pure vector search.