When a single developer uses Claude CLI, costs are intuitive — you see the bill and adjust. When ten developers run thousands of API calls per day, costs become a system design problem. This chapter covers the budgeting, model selection, and optimization strategies that keep production workloads predictable and affordable.
See What It Costs for per-call pricing and cache economics.
Team Cost Benchmarks
Before setting organizational budgets, you need a baseline. These figures come from real-world Claude Code usage across development teams.
Team Cost Benchmarks
| Metric | Value | Planning Note |
|---|---|---|
| Average per developer/day | $6 | Use this for baseline budgeting |
| 90th percentile daily | $12 | Set alerts at this threshold |
| Monthly estimate per developer | $100-200 | 22 working days at $5-9/day |
| 10-person team monthly | $1,000-2,000 | Budget for $2,500 to cover spikes |
| 50-person team monthly | $5,000-10,000 | Negotiate volume pricing at this tier |
These numbers assume a mix of Opus and Sonnet usage with caching enabled. Without caching, multiply by 3-5x. A 100-turn Opus session costs $50-100 uncached versus $10-19 cached — that difference compounds fast across a team.
Track per-developer spend with —output-format json and aggregate total_cost_usd across calls. Tools like ccusage can automate this across your organization.
Budget Controls
Claude CLI provides budget controls at three levels: per-call, per-session, and organizational.
Per-Call Budgets
The --max-budget-usd flag sets a spending limit on a single invocation:
# Cap a code review at $2claude -p "Review this PR for issues" --output-format json --max-budget-usd 2.00
# Cap a quick lookup at $0.10claude -p "What does this function do?" --output-format json --max-budget-usd 0.10In scripts that run unattended, always set a budget. Without one, a multi-turn agent loop can burn through dollars before anyone notices.
The budget is checked between turns, not mid-generation. A single turn can exceed the budget by 100x or more. Set budgets at least 10x your expected per-turn cost. A $0.10 minimum is practical for Opus; anything below $0.02 will always trigger a budget error from the system prompt alone.
Experimentally measured overshoot ratios for Opus (2026-03-19):
Budget Overshoot by Level
| Budget | Actual Cost | Overshoot Ratio | Verdict |
|---|---|---|---|
| $0.001 | $0.039 - $0.048 | 38 - 48x | Minimum API floor dominates |
| $0.01 | $0.017 | 1.7x | Reasonable at realistic budgets |
| $0.10+ | Near budget | ~1x | Budget controls work as expected |
The minimum API call floor for Opus is ~$0.02-0.05 (system prompt + one turn). Any budget below this floor guarantees a massive overshoot ratio because the first turn always completes regardless. For practical budget control, use $0.05+ for single-turn tasks and $0.10+ for multi-turn.
Per-Session Budgets
When using --resume to continue a conversation, the budget tracks cumulative cost across all calls in that session:
# First call — uses part of the $5 budgetclaude -p "Analyze the codebase" --output-format json --max-budget-usd 5.00
# Resumed call — shares the same $5 budgetclaude -p "Now refactor the auth module" --resume $SESSION_ID --max-budget-usd 5.00A $5 budget shared across 5 resumed calls averages about $1 per call. Plan your per-session budget around the total work you expect, not individual turns.
Organizational Limits
For teams, per-call budgets are not enough. You need a wrapper that enforces organizational policy:
#!/bin/bash# team-claude.sh — wrapper that enforces org-wide daily limitsDAILY_LIMIT=15.00DAILY_LOG="/tmp/claude-cost-$(whoami)-$(date +%Y%m%d).log"
# Sum today's spendSPENT=$(awk '{sum += $1} END {printf "%.4f", sum}' "$DAILY_LOG" 2>/dev/null || echo "0")
REMAINING=$(echo "$DAILY_LIMIT - $SPENT" | bc)if (( $(echo "$REMAINING <= 0" | bc -l) )); then echo "Daily budget exhausted ($DAILY_LIMIT). Resets tomorrow." >&2 exit 1fi
# Run with remaining budget as capRESULT=$(claude -p "$@" --output-format json --max-budget-usd "$REMAINING")COST=$(echo "$RESULT" | jq -r '.total_cost_usd // 0')echo "$COST" >> "$DAILY_LOG"echo "$RESULT"This pattern gives you a daily per-developer cap while still letting individual calls use --max-budget-usd for finer control underneath.
Model Selection for Cost
Model choice is the single biggest lever for cost at scale. Sonnet is 5x cheaper than Opus per token — $3/$15 per million input/output versus $15/$75.
Model Selection Guide
| Task Type | Model | Why |
|---|---|---|
| Code formatting, linting fixes | claude-sonnet-4-6 | Mechanical transforms; Sonnet handles these equally well |
| Test generation | claude-sonnet-4-6 | Pattern-following work with clear templates |
| Summarization, documentation | claude-sonnet-4-6 | Text synthesis where speed matters more than depth |
| Complex refactoring | claude-opus-4-6 | Cross-file reasoning with architectural implications |
| Bug diagnosis | claude-opus-4-6 | Requires deep reasoning about state and side effects |
| Architecture decisions | claude-opus-4-6 | Trade-off analysis requiring broad context understanding |
Switch models with the --model flag:
# Sonnet for routine tasks — ~$0.005/callclaude -p "Add JSDoc comments to this file" --model claude-sonnet-4-6
# Opus for complex reasoning — ~$0.016/call minimumclaude -p "Find the race condition in this auth flow" --model claude-opus-4-6In CI pipelines, default to Sonnet and escalate to Opus only when Sonnet’s output fails validation or quality checks. A two-pass strategy — Sonnet drafts, Opus reviews only when needed — can cut pipeline costs by 60-80%.
In a wrapper script, set the default model based on the task category. Route formatting, linting, and boilerplate generation to Sonnet automatically. Reserve Opus for commands that explicitly request it or for tasks that require multi-file reasoning.
Effort Levels at Scale
The --effort flag controls how much thinking Claude does before responding. Lower effort means fewer output tokens, which directly reduces cost.
Effort Level Comparison
| Effort | Behavior | Best For | Relative Cost |
|---|---|---|---|
low | Minimal thinking, short responses | Lookups, simple transforms, yes/no questions | ~0.5x |
medium | Balanced thinking and output | Standard code generation, explanations | ~1x (baseline) |
high | Extended reasoning, thorough output | Complex debugging, multi-step analysis | ~1.5-2x |
max | Maximum reasoning depth | Hardest problems, architecture reviews | ~2-3x |
At scale, effort tuning adds up. If 70% of your team’s calls are simple lookups or formatting tasks, routing them through --effort low can cut total output token costs nearly in half for those calls:
# Low effort for simple tasksclaude -p "What is the return type of this function?" --effort low
# Default effort for standard workclaude -p "Write unit tests for the auth module"
# Max effort for the hardest problemsclaude -p "Diagnose why this distributed lock fails under contention" --effort maxEffort affects output token count, not input tokens. The system prompt overhead is the same regardless of effort level. For calls dominated by input cost (large context, many files), effort tuning has minimal impact. It matters most for calls that generate long responses.
Cache Optimization
Caching is automatic in Claude CLI, but how you structure your workload determines whether you get 90% savings or pay full price on every call.
Batch Similar Prompts
Calls that share the same system prompt benefit from cache reads at one-tenth the input price. Run similar tasks together rather than interleaving different workloads:
# Good: batch similar tasks — second call gets cache hitsclaude -p "Review src/auth.ts for security issues" --output-format jsonclaude -p "Review src/auth.ts for performance issues" --output-format json
# Wasteful: interleave different system contextsclaude -p "Review src/auth.ts" --output-format jsonclaude -p "Format this JSON" --model claude-sonnet-4-6 --output-format jsonclaude -p "Review src/db.ts" --output-format jsonMaintain Sessions
Resuming a session is cheaper than starting fresh. Every resumed turn reads the accumulated context from cache at the 0.1x rate instead of re-ingesting it:
# Start a sessionRESULT=$(claude -p "Analyze the codebase structure" --output-format json)SESSION=$(echo "$RESULT" | jq -r '.session_id')
# Resume — context is cached, subsequent turns are cheaperclaude -p "Now focus on the API layer" --resume "$SESSION" --output-format jsonclaude -p "Suggest improvements" --resume "$SESSION" --output-format jsonThe Cache Math
Understanding the numbers helps you make better decisions about session management:
Cache Impact on a 10-Turn Opus Session
| Scenario | Calculation | Cost |
|---|---|---|
| Fresh call each time | 10 x 15K input tokens at $15/M | $2.25 |
| All cached | 1 cache write + 9 cache reads | $0.65 |
| Savings | 71% |
For a team of 10 developers each running 50 calls per day, the difference between cached and uncached workloads is roughly $240/day — over $5,000/month in savings from session management alone.
Disable Tools When Not Needed
Tool descriptions inflate the system prompt. For pure text tasks, stripping them out reduces input tokens:
claude -p "Summarize this document" --tools ""This removes tool definitions from the system prompt, trimming the per-call input token count. The savings are modest per call but compound across thousands of daily invocations.
Cache entries have TTLs — 5 minutes for short-lived caches, 1 hour for extended caches. If you batch tasks with long gaps between them, the cache may expire and you pay full write cost again. Keep batch runs tight to maximize cache hits.