Every Claude CLI call costs money. Input tokens, output tokens, and cache operations all have a price. Understanding the minimum per-call overhead (~$0.016 for Opus) and cache economics (90% savings on repeated prompts) is essential before you run anything at scale. This chapter breaks down exactly where your dollars go and how to keep them under control.
Token Pricing
Token Pricing (March 2026)
| Token Type | Opus (per 1M) | Sonnet (per 1M) |
|---|---|---|
| Input tokens | $15.00 | $3.00 |
| Output tokens | $75.00 | $15.00 |
| 5-min cache write | $18.75 | $3.75 |
| 1-hour cache write | $30.00 | $6.00 |
| Cache read | $1.50 (90% off!) | $0.30 (90% off!) |
The cache read row is the one that matters most. Once a system prompt or conversation prefix is cached, subsequent calls read it back at one-tenth the input price. This is where the real savings come from.
The Minimum Per-Call Cost
Every call pays a “tax” for the system prompt, regardless of what you actually ask:
- Opus: ~$0.016 minimum per call
- Sonnet: ~$0.005 minimum per call
The system prompt is approximately 14,253 cache read tokens. Even a trivial question like “What is 1+1?” costs $0.016 with Opus because those tokens must be read before Claude can respond.
This has a direct consequence: --max-budget-usd 0.01 will always trigger error_max_budget_usd with Opus because the system prompt alone exceeds the budget. Set a minimum of $0.02 for Opus, or better yet, $0.10 to give Claude room to actually answer.
Budget Is Not a Hard Cap
The --max-budget-usd flag is checked between turns, not mid-generation. Here is what actually happens:
- Claude starts generating a response
- The response completes (full turn)
- Then the budget is checked
- If exceeded, the next turn does not start
This means a single turn can blow past the budget by any amount. A $0.001 budget can result in $0.15 of spend if the first turn generates a long response.
—max-budget-usd is NOT a hard cap. It is checked between turns, not mid-generation. A single turn can exceed the budget by 100x or more. If you set a budget of $0.001 and ask for a 5,000-word essay, Claude will write the entire essay before checking the budget. Always set budgets at least 10x your expected per-turn cost.
Here is a real response from setting a $0.001 budget and asking for a long essay:
The subtype field changes from "success" to "error_max_budget_usd", but is_error remains false — this is a controlled stop, not a crash. Always check subtype rather than is_error to detect budget limits.
Cache Economics
Caching is where the real savings happen. Here is what a typical call looks like under the hood.
First call to a new session:
- ~14K tokens read from cache (system prompt, cached from a previous session)
- ~1.4K tokens written to cache (new session-specific content)
Subsequent calls (same session or same system prompt):
- Almost everything hits cache reads at the 0.1x rate
- Only new content creates cache entries
Savings calculation for a 10-turn Opus session:
| Without Caching | With Caching | |
|---|---|---|
| Cost | 10 x 15K input tokens x $15/M = $2.25 | 1 x 15K cache write + 9 x 15K cache reads = $0.65 |
| Savings | — | 71% |
The takeaway: keep conversations in the same session whenever possible. Every resumed turn benefits from cache reads at 90% off.
5 Cost Optimization Strategies
-
Use Sonnet for simple tasks — Run
claude -p "Format this JSON" --model claude-sonnet-4-6. Sonnet is 5x cheaper per token. Reserve Opus for complex reasoning. -
Disable tools for pure text tasks — Run
claude -p "Summarize this text" --tools "". No tool descriptions in the system prompt means fewer input tokens. -
Use
--effort lowfor simple questions — Runclaude -p "What is 2+2?" --effort low. Lower effort means shorter thinking and fewer output tokens. -
Set realistic budget caps — Minimum $0.10 for Opus single-turn, $5.00 for multi-turn sessions. Anything below $0.02 for Opus will always trigger a budget error.
-
Batch similar prompts to maximize cache hits — Run prompts with the same system prompt back-to-back. Subsequent calls benefit from cache reads at 90% off.
Industry Benchmarks
Cost Benchmarks
| Metric | Value |
|---|---|
| Average per dev/day | $6 |
| 90th percentile daily | $12 |
| Monthly estimate | $100-200/dev |
| 100-turn Opus (no cache) | $50-100 |
| 100-turn Opus (with cache) | $10-19 |
The difference between cached and uncached sessions is dramatic — up to 5x cheaper. This is why session management and cache awareness matter so much for teams running Claude at scale.
Track costs per call by piping to jq: claude -p ”…” —output-format json | jq ‘{cost: .total_cost_usd, tokens: .usage.output_tokens}’