What It Costs

Understand token pricing, cache economics, and budget controls

Every Claude CLI call costs money. Input tokens, output tokens, and cache operations all have a price. Understanding the minimum per-call overhead (~$0.016 for Opus) and cache economics (90% savings on repeated prompts) is essential before you run anything at scale. This chapter breaks down exactly where your dollars go and how to keep them under control.

Token Pricing

Token Pricing (March 2026)

Token TypeOpus (per 1M)Sonnet (per 1M)
Input tokens$15.00$3.00
Output tokens$75.00$15.00
5-min cache write$18.75$3.75
1-hour cache write$30.00$6.00
Cache read$1.50 (90% off!)$0.30 (90% off!)

The cache read row is the one that matters most. Once a system prompt or conversation prefix is cached, subsequent calls read it back at one-tenth the input price. This is where the real savings come from.

The Minimum Per-Call Cost

Every call pays a “tax” for the system prompt, regardless of what you actually ask:

  • Opus: ~$0.016 minimum per call
  • Sonnet: ~$0.005 minimum per call

The system prompt is approximately 14,253 cache read tokens. Even a trivial question like “What is 1+1?” costs $0.016 with Opus because those tokens must be read before Claude can respond.

This has a direct consequence: --max-budget-usd 0.01 will always trigger error_max_budget_usd with Opus because the system prompt alone exceeds the budget. Set a minimum of $0.02 for Opus, or better yet, $0.10 to give Claude room to actually answer.

Budget Is Not a Hard Cap

The --max-budget-usd flag is checked between turns, not mid-generation. Here is what actually happens:

  1. Claude starts generating a response
  2. The response completes (full turn)
  3. Then the budget is checked
  4. If exceeded, the next turn does not start

This means a single turn can blow past the budget by any amount. A $0.001 budget can result in $0.15 of spend if the first turn generates a long response.

Gotcha

—max-budget-usd is NOT a hard cap. It is checked between turns, not mid-generation. A single turn can exceed the budget by 100x or more. If you set a budget of $0.001 and ask for a 5,000-word essay, Claude will write the entire essay before checking the budget. Always set budgets at least 10x your expected per-turn cost.

Here is a real response from setting a $0.001 budget and asking for a long essay:

Budget Exceeded — $0.001 budget, $0.152 actualartifacts/17/budget_overshoot.json
1{
2 "type": "result",
3 "subtype": "error_max_budget_usd",A
4 "duration_ms": 125999,
5 "is_error": false,
6 "num_turns": 1,
7 "session_id": "7a950bc0-0afb-4754-b66c-d65a5288dbd3",B
8 "total_cost_usd": 0.1521415,
9 "modelUsage": {
10 "claude-opus-4-6": {
11 "inputTokens": 3,C
12 "outputTokens": 5446,
13 "cacheReadInputTokens": 14253,
14 "cacheCreationInputTokens": 1416,
15 "costUSD": 0.1521415
16 }
17 }
18}
ANot 'success' — budget was exceeded
B152x the $0.001 budget — single turn can overshoot by any amount
C5,446 tokens generated before budget check

The subtype field changes from "success" to "error_max_budget_usd", but is_error remains false — this is a controlled stop, not a crash. Always check subtype rather than is_error to detect budget limits.

Cache Economics

Caching is where the real savings happen. Here is what a typical call looks like under the hood.

First call to a new session:

  • ~14K tokens read from cache (system prompt, cached from a previous session)
  • ~1.4K tokens written to cache (new session-specific content)

Subsequent calls (same session or same system prompt):

  • Almost everything hits cache reads at the 0.1x rate
  • Only new content creates cache entries

Savings calculation for a 10-turn Opus session:

Without CachingWith Caching
Cost10 x 15K input tokens x $15/M = $2.251 x 15K cache write + 9 x 15K cache reads = $0.65
Savings71%

The takeaway: keep conversations in the same session whenever possible. Every resumed turn benefits from cache reads at 90% off.

Token Cost CalculatorINTERACTIVE CALCULATOR
10K
1K
5
50%
Hit 50%Miss 50%
ESTIMATED COSTS
Opus
Per call
$0.1575
Per session (5 turns)
$0.79
Daily (10 sessions)
$7.88

5 Cost Optimization Strategies

  1. Use Sonnet for simple tasks — Run claude -p "Format this JSON" --model claude-sonnet-4-6. Sonnet is 5x cheaper per token. Reserve Opus for complex reasoning.

  2. Disable tools for pure text tasks — Run claude -p "Summarize this text" --tools "". No tool descriptions in the system prompt means fewer input tokens.

  3. Use --effort low for simple questions — Run claude -p "What is 2+2?" --effort low. Lower effort means shorter thinking and fewer output tokens.

  4. Set realistic budget caps — Minimum $0.10 for Opus single-turn, $5.00 for multi-turn sessions. Anything below $0.02 for Opus will always trigger a budget error.

  5. Batch similar prompts to maximize cache hits — Run prompts with the same system prompt back-to-back. Subsequent calls benefit from cache reads at 90% off.

Industry Benchmarks

Cost Benchmarks

MetricValue
Average per dev/day$6
90th percentile daily$12
Monthly estimate$100-200/dev
100-turn Opus (no cache)$50-100
100-turn Opus (with cache)$10-19

The difference between cached and uncached sessions is dramatic — up to 5x cheaper. This is why session management and cache awareness matter so much for teams running Claude at scale.

Tip

Track costs per call by piping to jq: claude -p ”…” —output-format json | jq ‘{cost: .total_cost_usd, tokens: .usage.output_tokens}’