What your AI coding subscription actually buys
A flat-rate AI coding subscription is a black box. You pay $100 a month and tokens go in and code comes out, and you have no idea whether you are getting a deal or leaving value on the table. The data to answer that is already sitting on your disk: every coding agent writes a transcript of each session locally. We built a small tool that reads those transcripts and turns them into a receipt.
It is called token-burn: one zero-dependency Python script, nothing leaves your machine, it just sums the token usage your agents already recorded. Here is what 30 days on a $100 plan looked like when we pointed it at our own.
The receipt
| 30 days, one $100 plan | |
|---|---|
| What you pay | $100 |
| API-list-equivalent value | ~$4,077 |
| Leverage | ~41× |
| Tokens | 5.03B |
| Billed turns | 30,216 |
The same usage, billed at public API list prices, would run about $4,077. That is roughly 41× the subscription price. One precise note on what that number is: it is list-price-equivalent value, not money saved. You were never going to buy 5 billion tokens at list, and the subscription is not a refund. It is a measure of how much compute the flat rate actually hands you.
Where the tokens actually go
The leverage is the headline. The breakdown is the surprise. Of those 5 billion tokens, about 96% were cache reads: the model re-reading context it had already been shown. Freshly generated output, the actual new code and prose, was about 0.6%.
This is why the subscription economics work at all: cache reads are cheap per token, so a provider can let you re-read enormous amounts of context under a flat rate. It is also where any efficiency lever has to operate. If you want to spend fewer tokens, you reduce what gets re-read, not what gets generated.
What the agent actually does
Tokens are one cut of the transcript. The commands are another. The same local files record every shell command the agent runs, and the distribution is its own kind of receipt. Out of roughly 21,500 shell commands in the same 30 days, here is where they went.
| Top shell programs | Share of commands |
|---|---|
| echo | 19% |
| grep | 14% |
| git | 11% |
| python3 | 9% |
| find | 5% |
| cargo | 4% |
| ls | 4% |
| sed | 4% |
The shape matches the token story. Lump the commands by what they do and almost a third, about 32%, is searching and reading the codebase: grep, find, ls, cat, sed. Another 16% is running code, python and cargo invoking tests and builds. 11% is git. There is nothing exotic in the long tail. The agent spends its time gathering and re-reading context, then committing, which is the same thing the tokens said from a different angle.
We measured it wrong the first time
The first run reported 12.3 billion tokens and 106× leverage. That was wrong. Resumed and forked agent sessions copy earlier turns into new transcript files, so a naive sum counts the same billed turn more than once. Deduplicating on the message and request IDs cut the total by about 2.4×, down to the 5.03 billion and 41× above. Those are the corrected numbers, and they are the ones in the table. The tool ships with the dedup built in so you do not inherit the same mistake.
Run it on your own plan
The whole point is that you can check this against your own usage in about ten seconds. token-burn is standard-library Python, reads only the transcript files your tools already write, and auto-detects whichever harnesses it finds (Claude Code and Codex today).
git clone https://github.com/teflon07/token-burn
cd token-burn
python3 burn.py --plan 100
It prints your last 30 days broken down by harness and model, with the list-price-equivalent next to what you pay. Nothing is uploaded.
The fine print, and a moving target
Two caveats on the numbers. They come from our real day-to-day workflow, which already uses memkeeper for retrieval rather than re-reading everything by hand, so this is not a memkeeper-free baseline. And they are a point-in-time snapshot from one setup; as the tooling and our habits change, the figures will move with them.
What the 96% tells us holds regardless: context, not generation, is where an AI coding budget goes. How much of that context is genuinely needed versus re-read out of habit is the open question, and the one we keep poking at, including the obvious test of the same task with retrieval on versus off. Treat these as a running tally rather than a final word. We will keep sharing what we measure as it shifts, so check back.