LLM API spend is the new shadow cloud bill. Every tool call your agent makes is metered tokens — context in, completion out — and most of that context is repeated file reads and shell output the model already saw five minutes ago. The three tools below attack that waste at the source: cache it, graph it, or compress it before it ever hits the wire.
CodeGraph
Summary. CodeGraph pre-indexes your repository into a semantic knowledge graph that Claude Code, Cursor, Codex, OpenCode, and Hermes Agent can query instead of grepping the filesystem token by token.
Maintained by colbymchenry.
Pitches ~35% cheaper runs and ~70% fewer tool calls per the project README.
100% local — the graph lives on your machine, no third-party index service.
One-line installer for macOS, Linux, and Windows. No Node.js required.
Use case. Cut the per-prompt context tax on AI coding agents by replacing fan-out file reads with a single graph lookup.
Imagine a 400-file service repo where every "find the callers of X" question in Cursor triggers a dozen grep and read_file tool calls. With CodeGraph indexing the symbol graph ahead of time, the agent asks the graph once, gets a structured answer, and pays for the tokens of that answer instead of the tokens of twelve speculative reads. If your team is on Bedrock, Anthropic, or OpenAI pay-per-token plans, that delta lands directly on the AWS or vendor invoice you reconcile each month — not in some hand-wavy "developer productivity" bucket.
The honest FinOps test: turn on usage tracking for one sprint, install CodeGraph for half the team, and compare token spend per merged PR. If the README numbers hold even loosely, the savings should be obvious in the billing console.
lean-ctx
Summary. lean-ctx is a Rust binary that compresses file and shell output before it ever reaches Cursor, Claude Code, Copilot, Windsurf, Codex, or Gemini.
Maintained by yvgude, with contributions from glemsom and frpboy.
Ships as a single Rust binary — shell hook or MCP server, your pick.
README claims 60–95% token reduction, up to 99% on cached reads.
59 tools, 10 read modes, 95+ compression patterns per the project description.
Use case. Sits between your terminal and your agent and strips the cruft — ANSI codes, repeated lines, low-signal stack frames — before tokens get billed.
Imagine an agent debugging a CloudFormation deploy. It tails the stack events, runs aws cloudformation describe-stack-events, then re-reads a 4,000-line Lambda log. Three calls, easily 50k tokens, half of it whitespace and ANSI escape sequences. With lean-ctx in the path, the agent sees a compressed view and the model bill drops accordingly. The Rust binary keeps the hook itself cheap — no Node process tax per shell invocation.
If you're running a small platform team where every engineer has Cursor or Claude Code wired to a corporate Anthropic key, this is the lowest-effort intervention on the list. Drop the binary in, point your shell hook at it, watch the daily token graph.
Graphify
Summary. Graphify turns any folder — app code, SQL schemas, R scripts, shell scripts, docs, papers, images, video — into a queryable knowledge graph your AI coding assistant can hit with a /graphify slash command.
Maintained by safishamsi.
Works with Claude Code, Codex, OpenCode, Cursor, Gemini CLI, and others.
Unique angle: one graph spans app code + database schema + infrastructure — not just source.
Mixed-media input set is broader than the other two tools in this post.
Use case. Give the agent one place to ask "what touches this table?" or "what infra backs this endpoint?" instead of letting it shotgun-read your Terraform, your migrations, and your handlers separately.
Imagine answering "if we drop column customer_status, what breaks?" The naive agent loop reads every .sql, every model file, every IaC module, then re-reads them when it loses focus. With a pre-built Graphify index, that's a graph traversal, not a tour of your repo. For cost-attribution work specifically — mapping a workload to its team, its Terraform module, and its CloudWatch namespace — having app + schema + infra in one graph is genuinely useful, because the relationships exist whether you've modeled them or not.
There is no published token-reduction number in the material, so don't promise leadership a percentage. Treat it as a structural bet: less re-reading, fewer tool calls, smaller invoice.
Pattern, not coincidence. All three tools attack the same FinOps problem from slightly different angles:
lean-ctx compresses the bytes after the agent decides what to look at.
CodeGraph changes what the agent decides to look at in the first place.
Graphify widens "look at" to include schema and infra, not just code.
You can stack them. A graph-aware agent that also runs its shell output through a compressor is paying for the smallest possible surface area of tokens.
One decision this week. Pick the repo where AI-assisted spend is highest — usually the largest monorepo with the most active Cursor or Claude Code users — and run a one-week A/B with CodeGraph on half the team. Measure tool-call count and token spend per PR. If the numbers land anywhere near the README's claims and the tool saved your org real figures on the Anthropic or Bedrock invoice, sponsor yvgude for lean-ctx while you're at it; this category of work only continues if the maintainers can afford to keep shipping.



