WardenSecurityJune 23, 2026

A capability broker for AI coding agents

Nothing gets past the gate without a reason. And a receipt.

To be useful, a coding agent needs real capabilities: run shell commands, read and write files, reach the network. We hand it the keys because that is the whole point. Mostly it does exactly what we asked. The failure mode is not the agent turning hostile, it is the agent, or a skill it pulled in, quietly wandering outside its lane: writing where it should only read, running a command nobody scoped, touching a path it had no business touching.

Warden is the part of the Memkeeper family that addresses that. It is a capability broker and execution gate: before an agent's action runs, Warden decides whether a declared, auditable policy allows it, and it logs every decision. Allow or deny, with a reason, on the record. Most tools in this space assert least privilege. The point of this post is to show the ledger.

How it works

Capability-scoped. Actions are matched against typed capabilities, fs:read, fs:write, exec, net, with path and host scope matching that defends against traversal and suffix-spoofing.
Declarative policy. A four-column TSV grants capabilities per principal: principal · class · scope · allow|deny. Skills declare the capabilities they need in their front-matter, so the ask is explicit and reviewable.
Deny-by-default, fail closed. If a policy does not grant it, it does not happen. If the broker is unreachable under enforce, the action is denied, not waved through.
Local and deterministic. No network and no LLM in the decision path. The same request against the same policy always gives the same answer.
Auditable. Every allow and every deny is appended to a JSONL log. The receipt is the point.

The ledger

A deny-by-default gate is only as honest as its log. So here is a real one: every decision Warden made on a single working machine over an eleven-day window (2026-06-11 to 2026-06-22), one principal, no synthetic traffic.

Decision	Count	Share
Allowed (matched a grant)	15,569	55.9%
Denied (no matching grant)	12,291	44.1%
Total brokered decisions	27,860	100%

The decisions break down by capability class as you would expect for a coding agent: it runs far more commands than it touches files.

Capability	Decisions	Share
`exec` (subprocess)	23,467	84.2%
`fs:read`	2,230	8.0%
`fs:write`	2,163	7.8%

The number that matters is the 44.1% deny rate, and the honest reading of it. Every one of those 12,291 denials was no grant (deny-by-default): an action the starter allowlist simply did not cover. None were explicit deny rules. And the gate spent most of this window running dry, so those denials were recorded, not enforced. The actions still ran.

That is not a weakness in the measurement, it is the whole argument for dry mode. Flip a fresh policy straight to enforce and you would have broken nearly half of this agent's real work on day one. Run dry first and the log tells you exactly which 44% you need to grant before you turn the key. You build least privilege from observed traffic instead of guessing it up front.

What it provably stops

Counts show usage; they do not show the gate is sound. For that, the question is whether scope matching can be tricked. Each row below is a specific evasion attempt with a test that asserts the outcome. These run on every build.

Attempt	Result
`../../etc/passwd` traversal out of a granted directory	denied
`/a/trading-secrets` against a grant for `/a/trading` (sibling-prefix)	denied
`api.anthropic.com.evil.com` against a grant for `api.anthropic.com` (DNS suffix spoof)	denied
Unbounded scope (`**`, bare `/`) in a grant	rejected
An `fs:read` grant used to authorize a write	denied
`sudo rm` / `timeout 5 curl` (wrapper commands)	unwrapped, gated on `rm` / `curl`
Broker unreachable while enforcing	fail closed (deny)

Scope matching is the part Warden enforces well. Path matching normalizes .. and compares on path components, not raw string prefixes, so a sibling directory whose name merely starts with an allowed one is not confused for it. Host matching rejects a name that only ends with an allowed suffix. Unbounded scopes are refused outright rather than silently granting everything.

The test surface

None of the above is interesting if you cannot check it. The defenses ship with the tests that prove them.

Suite	Tests	What it covers
Rust core (unit + integration)	61	scope matching, policy parsing, broker logic, audit writes
Claude Code gate (end-to-end)	32	the real hook, driven by subprocess against a built binary
MCP bridge	4	the MCP adapter

The 61 Rust tests run in well under a second and pass on a plain cargo test. The 32 gate tests are the ones that matter most for trust: they do not unit-test a function in isolation, they spawn the actual warden binary and feed it the exact JSON the Claude Code hook contract delivers, including the wrapper-unwrapping and fail-closed cases above.

git clone https://github.com/teflon07/memkeeper-warden
cd memkeeper-warden

cargo test                       # 61 Rust unit + integration tests
cd adapters/claude-code && pytest # 32 end-to-end gate tests

# inspect your own ledger once the gate is running
warden log analyze

An honest threat model

Here is the part most tools in this space skip. Warden is a policy gate, not a sandbox, and it is worth being exact about that, because a guardrail you misjudge is worse than no guardrail. Within its model it is deny-by-default and fails closed, and its job, stopping accidental damage and skill or agent overreach, it does. What it does not do, stated plainly:

Path matching is lexical. It normalizes .. but does not resolve symbolic links. A symlink inside an allowed directory that points outside it is followed by the OS, not caught by the policy. Keep allowed roots free of untrusted symlinks. Realpath hardening is planned.
The exec gate matches the wrapper, not the intent. It decides on the program name. It unwraps sudo and timeout, but it does not recurse into sh -c '...', bash -c '...', or python3 -c '...'. Grant a program that can run arbitrary subcommands and you have effectively granted what those subcommands can do.
Non-filesystem capabilities are decision-only in v0.1. For exec, net, and memory access the gate returns an allow/deny verdict and the adapter enforces it; the mediated forwarders that would carry out the action inside the broker are follow-on work. Only filesystem ops are brokered end-to-end today.
The audit log is not yet tamper-evident. It is an honest append-only record, but hash-chaining to make edits detectable is deferred.
We have not published a latency benchmark. The decision path is local with no network and no LLM, a linear scan and a lexical compare, so it is cheap by construction, but we have not yet measured and published per-decision numbers and will not quote any we have not run.

Treat Warden as a guardrail, not a security boundary against code that is itself permitted to run. Its threat model is accidental damage and overreach, not a hostile harness actively trying to escape. True containment, a sandbox or forwarder, is a separate layer we have deliberately deferred rather than faked.

We write this down on purpose. A security tool that oversells its guarantees is how people get hurt. Warden tells you exactly where its edges are so you can decide what else you need behind it.

Dry by default

The reference integration wires Warden into Claude Code as a PreToolUse gate, so Bash and filesystem actions are brokered against your policy. It ships dry by default: it logs every decision and blocks nothing, so you build an allowlist from real usage first, then flip to enforce once the policy reflects how you actually work. The 27,860-row ledger above is what that first phase produces. The 44% you would otherwise have broken is what it saves you.

Memory and a gate are two halves of the same idea: an agent that remembers what matters, and one that is only allowed to do what it should. One keeps context. The other keeps a receipt.

Numbers are real decisions from a single workstation over eleven days; your mileage will differ with your policy and workload. Warden is pre-release (v0.1, MIT / Apache-2.0); the policy format and CLI may change before 1.0. This post publishes when Warden's repository goes public.