memkeeper
ToolsEngineering

Getting started: give your agent a memory in ten minutes

The rest of this blog argues about retrieval, benchmarks, and design. This post is the other thing: the actual commands. By the end you will have memkeeper installed, a memory written and recalled from the command line, and the same store wired into a coding agent so it can remember across sessions. No hosted service is required, lexical recall needs no API key, and the store is one SQLite file on your own disk.

Here is the core of it first: create a store, write one fact, then recall it with a question worded differently from the stored sentence. The whole loop, against the real binary:

The same three commands appear below, copy-paste ready. Output is exactly what the CLI prints.

Install

memkeeper is a self-contained Rust binary, published for macOS (Apple Silicon) and Linux x86_64. Install the latest release with one command:

curl -fsSL https://raw.githubusercontent.com/teflon07/memkeeper/main/install.sh | bash

That downloads the binary for your platform, verifies its SHA-256 checksum, and installs it to ~/.local/bin. Prefer to do it by hand? Grab a tarball from the releases page (and verify its .sha256), or build from source with a Rust toolchain (git clone && cargo install --path crates/memkeeper-cli). Either way, confirm it is there:

memkeeper --help

Create a store

Initialize the database. With no flags it lands at ~/.memkeeper/store.sqlite, which is where every command looks by default.

memkeeper init

That is the entire setup for the store. It is a single SQLite file. Back it up by copying it; move it to another machine by copying it there. There is no external database or hosted service to stand up.

Write and recall your first memory

memkeeper takes its input as JSON, so commands compose cleanly into scripts and agents. Write a durable fact with remember:

memkeeper remember --json '{"content":"memkeeper stores memories in a local SQLite database"}'

Now recall it. search takes a query and an optional limit:

memkeeper search --json '{"query":"where are memories stored","limit":3}'

The query is a question, worded differently from the stored sentence, and the right memory still comes back. Check what is in the store at any time:

memkeeper stats --json
The --json value is flexible. Pass a literal string as above, point at a file with @path/to/file.json, or pipe from stdin with -. That is what makes memkeeper easy to drive from a shell script or another program, not just by hand.

Turn on semantic search

What you just saw runs in lexical mode (BM25 over SQLite's full-text index): fast, and the default with zero setup. For the full hybrid pipeline, dense embeddings plus a cross-encoder reranker, memkeeper gives you two ways to add semantics. Which one you use depends on the binary you installed.

On-device, the recommended path. The release binary has the local model runtime built in, so there is nothing to rebuild. Fetch the models once and everything stays on your machine, with nothing sent anywhere:

memkeeper pull-models               # embed + rerank models, ~2.1GB
memkeeper pull-models --quantized   # smaller, ~0.6GB

That is the whole setup. pull-models writes to ~/.memkeeper/models/, exactly where memkeeper looks by default, so semantic search turns on with no environment variables to set. Run a search and it is active. (If the models are ever missing, memkeeper doctor says so and points you back here.)

No download: point at an embeddings API. Rather than downloading models, you can run semantic search by calling an OpenAI-compatible embeddings endpoint instead. A few environment variables and you have the hybrid pipeline with no multi-gigabyte download:

export MEMKEEPER_EMBED_PROVIDER=openai
export MEMKEEPER_EMBED_BASE_URL=https://api.openai.com/v1/embeddings
export MEMKEEPER_EMBED_API_KEY=sk-...
export MEMKEEPER_EMBED_MODEL=text-embedding-3-small
export MEMKEEPER_EMBED_DIMS=1536

This is the off-device mode: your queries and memory text go to whatever endpoint you point at, which is the trade for skipping the download. If you would rather nothing leave the machine, use the on-device path above. Either way, if you have already written memories in lexical mode, backfill their embeddings once after switching:

memkeeper reindex --embed
Three modes, one choice up front. Lexical (zero setup), on-device semantic (local models), or off-device semantic (an embeddings API). The embedding backend is recorded in the store, so switching later means re-embedding with reindex --embed, not a flag flip.

Wire it into your agent

This is the part that matters. memkeeper speaks MCP natively, so a coding agent can call it directly. The binary serves MCP with memkeeper mcp. Add this to your agent's MCP config, for example the mcpServers block in a Claude Code or Claude Desktop config file:

{
  "mcpServers": {
    "memkeeper": {
      "command": "memkeeper",
      "args": ["mcp"]
    }
  }
}

Restart the agent and it now has two core tools: remember to write a durable fact, and search to recall one later. memkeeper is not a transcript miner; capture is explicit. Tell the agent something worth keeping ("we deploy from the release branch, never main") and have it call remember. Ask about that later in a fresh session, with the original conversation long gone from the context window, and it can call search to get the fact back.

The store is shared. The CLI and the MCP server read and write the same ~/.memkeeper/store.sqlite. A memory you write by hand is recallable by the agent, and a memory an agent writes is one you can inspect with memkeeper search. One memory, many front doors.

A look at what is in there

memkeeper ships a local dashboard if you would rather see the store than query it:

memkeeper serve --http

That serves a read-only view at http://127.0.0.1:7777, on your machine only. Useful for confirming the agent is actually writing what you expect, and for browsing how memories link together.

That's the loop

Install, init, remember, search, and a four-line MCP block to hand the same store to an agent. Everything stays in one local file you own. From here the interesting questions are the ones the rest of this blog covers: how the retrieval actually works, why documents are kept separate from memories, and why the memory is built to tell you when it cannot answer.