Illustration of tokens burning in a fire with a graph showing savings
blog

10 Tools to Stop Burning Your Claude Code Tokens

Your context window is leaking. Some ideas to fix it. 🐑

Agent Workflows
Terminal
Editor Workflow
macOS
Illustration of tokens burning in a fire with a graph showing savings

A hands-on guide to 10 tools that cut Claude Code token usage by 40–98%. Covers installation, usage, and when to use each one.

10 min read

Open a fresh Claude Code session. Type /context. Stare at the number.

That’s how many tokens Claude burned before you typed a single character — reading your CLAUDE.md, loading every enabled skill, ingesting MCP server descriptions, and parsing whatever was left in the context from the last conversation. Now run git status and watch it tick up another 2,000.

You don’t need to spend more to fix this. You need to send less noise.

This guide covers 10 tools that address the problem at different layers — terminal output, LLM responses, codebase navigation, documentation structure, and MCP overhead. Each section shows you exactly how to install and use it.


Quick Reference

ToolWhat it targetsClaimed reductionInstall complexity
CavemanOutput verbosity65–75%Instant (skill)
RTKTerminal output60–90%Low (binary)
Code Review GraphCodebase reads6–49×Medium (pip)
Context ModeMCP & logs98%Low (plugin)
Claude Token OptimizerDocs structure90%Low (curl)
Token OptimizerGhost tokensvariesLow (plugin)
Token Optimizer MCPMCP calls95%+Medium (npm)
Claude ContextCodebase search40%Medium (MCP)
Claude Token EfficientClaude responses~63%Instant (file drop)
Token SaviorCode navigation97%Medium (pip/uvx)

1. Caveman

Repo: JuliusBrussee/caveman — 39.7k stars

The simplest tool on this list. Caveman is a Claude Code skill that makes Claude drop articles, filler words, and pleasantries while keeping all the technical substance. Instead of a 69-token explanation, you get a 19-token one that says exactly the same thing.

Normal Claude:

“The reason your React component is re-rendering is likely because you’re creating a new object reference on each render cycle. When you pass an inline object as a prop, React’s shallow comparison sees it as a different object every time, which triggers a re-render. I’d recommend using useMemo to memoize the object.”

Caveman Claude:

“New object ref each render. Inline object prop = new ref = re-render. Wrap in useMemo.”

Same answer. 19 tokens instead of 69.

Install

claude plugin marketplace add JuliusBrussee/caveman && claude plugin install caveman@caveman

Or use the standalone hook installer:

bash <(curl -s https://raw.githubusercontent.com/JuliusBrussee/caveman/main/hooks/install.sh)

Usage

Activate it in any session:

/caveman

Or just say: "talk like caveman", "caveman mode", or "less tokens please".

Disable with: "stop caveman" or "normal mode".

Intensity levels

ModeDescription
liteDrop filler, keep grammar
fullDefault — fragments, no articles
ultraMaximum compression, telegraphic
文言文Classical Chinese compression

There’s also a caveman-compress companion skill that reduces input tokens in your CLAUDE.md files by ~46%.

💡 Pro tip: Pair caveman mode with long debugging sessions where you just need the fix, not the explanation. Switch back to normal mode when you want reasoning documented in commit messages or PR descriptions.


2. RTK (Rust Token Killer)

Repo: rtk-ai/rtk

RTK is a Rust binary that sits between your shell and Claude Code. When Claude runs git status, it doesn’t get the raw output — it gets a compressed summary. Same with cargo test, npm test, docker ps, and 30+ other commands. A 30-minute session that would consume ~118,000 tokens drops to ~24,000.

⚠️ Warning: There’s an unrelated package also called “rtk” on crates.io. Always use the --git flag when installing via Cargo, or use Homebrew.

This blog already has a deep-dive on RTK: RTK 101: Cut Your Claude Token Usage by 80%. Here’s the quick version:

Install

# Homebrew (recommended)
brew install rtk

# Linux / macOS without Homebrew
curl -fsSL https://raw.githubusercontent.com/rtk-ai/rtk/refs/heads/master/install.sh | sh

# Cargo (from source)
cargo install --git https://github.com/rtk-ai/rtk

Wire it to Claude Code

rtk init --global

Restart Claude Code. Done. Every Bash command Claude runs now passes through RTK automatically.

Check your savings

rtk gain             # Summary by command
rtk gain --graph     # ASCII graph (last 30 days)
rtk discover         # Commands still passing through uncompressed

3. Code Review Graph

Repo: tirth8205/code-review-graph

On a large monorepo, Claude reads every relevant file to understand a change. Code Review Graph builds a persistent Tree-sitter AST of your codebase stored in SQLite. When something changes, it computes the blast radius — which functions, classes, and files are actually affected — and hands Claude only those. You go from reading the whole codebase to reading a surgical slice.

  • Code reviews: 6.8× fewer tokens
  • Daily coding tasks: up to 49× fewer tokens
  • Initial indexing: ~10 seconds for a 500-file project
  • Incremental updates: <2 seconds

Supports 23 languages plus Jupyter notebooks.

Install

pip install code-review-graph
code-review-graph install   # auto-detects Claude Code, Cursor, Windsurf, etc.
code-review-graph build     # initial parse of your codebase

Key commands

code-review-graph update          # incremental re-parse of changed files
code-review-graph detect-changes  # risk-scored impact analysis
code-review-graph visualize       # interactive HTML dependency graph
code-review-graph watch           # continuous auto-updates

Once installed, you can ask Claude: “Build the code review graph for this project” — it’ll use the index automatically during reviews.

💡 Pro tip: Run code-review-graph watch in a background terminal during active development. The index stays current and Claude gets the minimal context for every question.


4. Context Mode

Repo: mksglu/context-mode

Context Mode attacks a different problem: raw tool output getting dumped into the context window. When Claude fetches a GitHub issue, runs a long log command, or scrapes a webpage, that data lands in your context and stays there. Context Mode sandboxes that output into SQLite instead.

  • 315 KB of tool output becomes 5.4 KB in context
  • Session continuity across compaction events (tasks, files, and decisions persist)
  • Local-only, no telemetry

Install for Claude Code

/plugin marketplace add mksglu/context-mode

Key commands

ctx stats     # Show context savings and call counts
ctx doctor    # Diagnose runtimes and FTS5 compatibility
ctx upgrade   # Update and reconfigure
ctx insight   # Analytics dashboard (local web UI)
ctx purge     # Delete indexed content

Core MCP tools available in your session

ToolWhat it does
ctx_executeRun code in 11 languages, return stdout only
ctx_batch_executeMultiple commands in one call
ctx_fetch_and_indexFetch URLs, cache 24 hours
ctx_indexChunk markdown into FTS5 with BM25 ranking
ctx_searchQuery indexed content

Example: Analyzing 500 commits becomes one tool call returning 5.6 KB instead of 315 KB of raw git log.

Research GitHub repo commits, extract top contributors and frequency
# → ctx_execute handles it, 1 call, 5.6 KB context

5. Claude Token Optimizer

Repo: nadimtuhin/claude-token-optimizer

Most projects load all their documentation at session start. This tool restructures that. It separates “always-load” documentation (startup, ~800 tokens) from “load-on-demand” documentation (the rest, 0 tokens until referenced).

Before: 8,000 tokens at startup, 11,000 total. After: 800 tokens at startup, 1,300 total.

Install

Run this in your project root:

curl -fsSL https://raw.githubusercontent.com/nadimtuhin/claude-token-optimizer/main/init.sh | bash

The script asks about your project type (Express, Next.js, Vue, Django, Rails, etc.) and takes ~2 minutes to scaffold the structure.

What it creates

CLAUDE.md            # Primary entry point (~50 tokens)
COMMON_MISTAKES.md   # Top 5 critical bugs (~350 tokens)
QUICK_START.md       # Frequent commands (~100 tokens)
ARCHITECTURE_MAP.md  # Code organization (~150 tokens)
.claude/             # Extended docs — loaded only when referenced
docs/                # Deep dives — loaded only when referenced

Maintenance habit

Add any bug that takes you more than an hour to track down to COMMON_MISTAKES.md. Next time, Claude reads it before guessing.

⚠️ Warning: The savings depend entirely on your current documentation being verbose. If you already have a lean CLAUDE.md, this tool has less impact.


6. Token Optimizer

Repo: alexgreensh/token-optimizer

This one goes hunting for what the project calls “ghost tokens” — token waste that’s invisible in normal usage: bloated CLAUDE.md files with stale content, unused skills still registered and loading, duplicate system prompts, MCP server descriptions for tools you’ve removed, and MEMORY.md entries beyond line 200 that Claude can’t actually access.

It provides a real-time quality score and automated compression recommendations.

Install

Claude Code plugin:

/plugin marketplace add alexgreensh/token-optimizer
/plugin install token-optimizer@alexgreensh-token-optimizer

Manual:

git clone https://github.com/alexgreensh/token-optimizer.git ~/.claude/token-optimizer
bash ~/.claude/token-optimizer/install.sh

Key commands

python3 measure.py quick                    # 10-second health check
python3 measure.py quality                  # 7-signal degradation tracking
python3 measure.py doctor                   # Installation health check (0–10 score)
python3 measure.py memory-review            # Audit MEMORY.md for orphans
python3 measure.py attention-score          # CLAUDE.md attention-curve alignment
python3 measure.py drift                    # Config growth vs. baseline
python3 measure.py savings                  # Dollar savings report
python3 measure.py dashboard --serve        # Local analytics dashboard

💡 Pro tip: Run python3 measure.py doctor after any significant change to your settings or CLAUDE.md — it catches invisible waste before it compounds across sessions.


7. Token Optimizer MCP

Repo: ooples/token-optimizer-mcp

Token Optimizer MCP applies caching and compression directly at the MCP layer. When Claude calls an MCP tool, the server intercepts the response and strips redundant content before it enters the context. The project claims 95%+ reduction on MCP-heavy workflows.

⚠️ Note: The README is sparse on implementation details. Verify the tool actively before committing it to your stack — the core concept is sound but maturity is unclear.

Install

git clone https://github.com/ooples/token-optimizer-mcp.git
cd token-optimizer-mcp
npm install
npm run build

Configure in your claude.json or .mcp.json following the server config in the repository’s server.json.


8. Claude Context

Repo: zilliztech/claude-context

From Zilliz (the Milvus vector database company). Claude Context adds an MCP server that indexes your codebase using hybrid search — BM25 keyword matching combined with dense vector embeddings. Instead of reading files to answer a question about your code, Claude queries the index with natural language and gets back only the relevant chunks.

Claims ~40% token reduction vs. traditional full-file reads.

Prerequisites

  • Zilliz Cloud account (free tier available) — for the vector database
  • OpenAI API key — for generating embeddings

Install for Claude Code

claude mcp add claude-context \
  -e OPENAI_API_KEY=sk-your-key \
  -e MILVUS_TOKEN=your-zilliz-token \
  -- npx @zilliz/claude-context-mcp@latest

Usage

  1. Open Claude Code in your project: cd your-project && claude
  2. Index your codebase: “Index this codebase”
  3. Check status: “Check the indexing status”
  4. Search: “Find functions that handle user authentication”

Available tools

ToolWhat it does
index_codebaseIndex a directory for hybrid search
search_codeNatural language query over indexed code
get_indexing_statusMonitor indexing progress
clear_indexRemove a codebase index

⚠️ Warning: This tool requires two external API keys (Zilliz + OpenAI). There’s a cost dependency beyond just Claude — factor that in before adopting it.


9. Claude Token Efficient

Repo: drona23/claude-token-efficient

The simplest install on this list: drop a CLAUDE.md into your repo. The file tells Claude to skip filler phrases, avoid re-reading unchanged files, prefer targeted edits over full rewrites, and omit preamble and closing pleasantries. The project measures ~63% output token reduction in test cases.

The file addresses eight default Claude behaviors that waste tokens without adding value: verbose explanations, unnecessary file rewrites, sycophantic chatter, over-engineered solutions, and more.

Install

Option 1 — direct download:

curl -o CLAUDE.md https://raw.githubusercontent.com/drona23/claude-token-efficient/main/CLAUDE.md

Option 2 — pick a profile:

git clone https://github.com/drona23/claude-token-efficient
cp claude-token-efficient/profiles/CLAUDE.coding.md your-project/CLAUDE.md

Option 3 — paste into chat: Copy the contents and paste them directly into any Claude session for one-off use.

The file works on an override principle — user instructions always win. Ask for a detailed explanation and you’ll get one.

Bonus: Matt Pocock’s one-liner

If you want an extra low-effort win, copy this simple rule from Matt Pocock:

“In all interactions and commit messages, be extremely concise and sacrifice grammar for the sake of concision.”

It’s not sophisticated, but it works surprisingly well to reduce verbosity and keep responses tighter.

💡 Pro tip: Merge this file with your existing CLAUDE.md rather than replacing it. Pick the rules that match your actual pain points and leave the rest.


10. Token Savior

Repo: Mibayy/token-savior

Token Savior is an MCP server with two capabilities: symbol-based code navigation and persistent memory across sessions.

Symbol navigation: Instead of reading entire files to find a function, Token Savior indexes your codebase by symbol. Finding a symbol goes from injecting 41 million characters to injecting 67 characters — a 99.9% reduction. Getting a function’s source is a direct 4.5K-char lookup.

Persistent memory: A SQLite-backed engine with FTS5 full-text search stores decisions, bugfixes, and conventions across sessions. Three-layer retrieval keeps even memory lookups lean: index first (~15 tokens), search second (~60 tokens), full fetch only when needed (~200 tokens).

Benchmark results across 170+ real sessions: 118/120 (98%) vs. plain Claude Code’s 67/120 (56%).

MetricPlain ClaudeToken SaviorDelta
Active tokens1.02M614K−40%
Wall time51 min28 min−46%
Benchmark score67/120118/120+42 pts

Install

Quickest (no venv needed):

uvx token-savior-recall

Via pip:

pip install "token-savior-recall[mcp]"

# With vector search support:
pip install "token-savior-recall[mcp,memory-vector]"

Register with Claude Code:

claude mcp add token-savior -- /path/to/venv/bin/token-savior

Or manually in claude.json:

{
  "mcpServers": {
    "token-savior-recall": {
      "command": "/path/to/venv/bin/token-savior",
      "env": {
        "WORKSPACE_ROOTS": "/path/to/project1,/path/to/project2",
        "TOKEN_SAVIOR_CLIENT": "claude-code"
      }
    }
  }
}

Memory retrieval pattern

Always start lean:

Layer 1: memory_index      →  ~15 tokens/result (always start here)
Layer 2: memory_search     →  ~60 tokens/result (only if L1 matched)
Layer 3: memory_get        →  ~200 tokens/result (final confirmation)

The God-Tier Stack

No single tool solves everything. Pick 2–3 based on where you’re actually bleeding.

If your problem is terminal output noise

RTK is the answer. One binary, one hook install, zero behavior change required. Every git, test, and build command Claude runs gets compressed automatically. Start here — it’s the easiest ROI on this list.

If your problem is large codebases

Code Review Graph + Token Savior together. Code Review Graph handles code reviews by computing blast radius. Token Savior handles everything else by navigating by symbol instead of by file. Combined, you’re not reading files anymore — you’re querying indices.

If your problem is MCP tool dumps

Context Mode is the right layer. It sandboxes raw tool output into SQLite before it touches your context window. A GitHub issue fetch goes from polluting your context with 300 KB to handing you a 5 KB summary.

If you want a zero-cost immediate win

Caveman + Claude Token Efficient require no infrastructure, no API keys, no accounts. Drop a CLAUDE.md, install a skill, and your next session already spends ~60% fewer tokens on output. Takes under two minutes.

The full stack (if you’re serious about it)

Caveman              ← Output verbosity
RTK                  ← Terminal noise
Context Mode         ← MCP dumps
Code Review Graph    ← Code review reads
Token Savior         ← Code navigation + memory
Claude Token Efficient ← CLAUDE.md baseline

Run /context in a fresh session before and after. The difference is real.


Sources

Link copied to clipboard