Cost at Scale

Budgets, benchmarks, and optimization for production workloads

When a single developer uses Claude CLI, costs are intuitive — you see the bill and adjust. When ten developers run thousands of API calls per day, costs become a system design problem. This chapter covers the budgeting, model selection, and optimization strategies that keep production workloads predictable and affordable.

See What It Costs for per-call pricing and cache economics.

Team Cost Benchmarks

Before setting organizational budgets, you need a baseline. These figures come from real-world Claude Code usage across development teams.

Team Cost Benchmarks

MetricValuePlanning Note
Average per developer/day$6Use this for baseline budgeting
90th percentile daily$12Set alerts at this threshold
Monthly estimate per developer$100-20022 working days at $5-9/day
10-person team monthly$1,000-2,000Budget for $2,500 to cover spikes
50-person team monthly$5,000-10,000Negotiate volume pricing at this tier

These numbers assume a mix of Opus and Sonnet usage with caching enabled. Without caching, multiply by 3-5x. A 100-turn Opus session costs $50-100 uncached versus $10-19 cached — that difference compounds fast across a team.

Tip

Track per-developer spend with —output-format json and aggregate total_cost_usd across calls. Tools like ccusage can automate this across your organization.

Budget Controls

Claude CLI provides budget controls at three levels: per-call, per-session, and organizational.

Per-Call Budgets

The --max-budget-usd flag sets a spending limit on a single invocation:

Terminal window
# Cap a code review at $2
claude -p "Review this PR for issues" --output-format json --max-budget-usd 2.00
# Cap a quick lookup at $0.10
claude -p "What does this function do?" --output-format json --max-budget-usd 0.10

In scripts that run unattended, always set a budget. Without one, a multi-turn agent loop can burn through dollars before anyone notices.

Gotcha

The budget is checked between turns, not mid-generation. A single turn can exceed the budget by 100x or more. Set budgets at least 10x your expected per-turn cost. A $0.10 minimum is practical for Opus; anything below $0.02 will always trigger a budget error from the system prompt alone.

Experimentally measured overshoot ratios for Opus (2026-03-19):

Budget Overshoot by Level

BudgetActual CostOvershoot RatioVerdict
$0.001$0.039 - $0.04838 - 48xMinimum API floor dominates
$0.01$0.0171.7xReasonable at realistic budgets
$0.10+Near budget~1xBudget controls work as expected

The minimum API call floor for Opus is ~$0.02-0.05 (system prompt + one turn). Any budget below this floor guarantees a massive overshoot ratio because the first turn always completes regardless. For practical budget control, use $0.05+ for single-turn tasks and $0.10+ for multi-turn.

Per-Session Budgets

When using --resume to continue a conversation, the budget tracks cumulative cost across all calls in that session:

Terminal window
# First call — uses part of the $5 budget
claude -p "Analyze the codebase" --output-format json --max-budget-usd 5.00
# Resumed call — shares the same $5 budget
claude -p "Now refactor the auth module" --resume $SESSION_ID --max-budget-usd 5.00

A $5 budget shared across 5 resumed calls averages about $1 per call. Plan your per-session budget around the total work you expect, not individual turns.

Organizational Limits

For teams, per-call budgets are not enough. You need a wrapper that enforces organizational policy:

#!/bin/bash
# team-claude.sh — wrapper that enforces org-wide daily limits
DAILY_LIMIT=15.00
DAILY_LOG="/tmp/claude-cost-$(whoami)-$(date +%Y%m%d).log"
# Sum today's spend
SPENT=$(awk '{sum += $1} END {printf "%.4f", sum}' "$DAILY_LOG" 2>/dev/null || echo "0")
REMAINING=$(echo "$DAILY_LIMIT - $SPENT" | bc)
if (( $(echo "$REMAINING <= 0" | bc -l) )); then
echo "Daily budget exhausted ($DAILY_LIMIT). Resets tomorrow." >&2
exit 1
fi
# Run with remaining budget as cap
RESULT=$(claude -p "$@" --output-format json --max-budget-usd "$REMAINING")
COST=$(echo "$RESULT" | jq -r '.total_cost_usd // 0')
echo "$COST" >> "$DAILY_LOG"
echo "$RESULT"

This pattern gives you a daily per-developer cap while still letting individual calls use --max-budget-usd for finer control underneath.

Model Selection for Cost

Model choice is the single biggest lever for cost at scale. Sonnet is 5x cheaper than Opus per token — $3/$15 per million input/output versus $15/$75.

Model Selection Guide

Task TypeModelWhy
Code formatting, linting fixesclaude-sonnet-4-6Mechanical transforms; Sonnet handles these equally well
Test generationclaude-sonnet-4-6Pattern-following work with clear templates
Summarization, documentationclaude-sonnet-4-6Text synthesis where speed matters more than depth
Complex refactoringclaude-opus-4-6Cross-file reasoning with architectural implications
Bug diagnosisclaude-opus-4-6Requires deep reasoning about state and side effects
Architecture decisionsclaude-opus-4-6Trade-off analysis requiring broad context understanding

Switch models with the --model flag:

Terminal window
# Sonnet for routine tasks — ~$0.005/call
claude -p "Add JSDoc comments to this file" --model claude-sonnet-4-6
# Opus for complex reasoning — ~$0.016/call minimum
claude -p "Find the race condition in this auth flow" --model claude-opus-4-6

In CI pipelines, default to Sonnet and escalate to Opus only when Sonnet’s output fails validation or quality checks. A two-pass strategy — Sonnet drafts, Opus reviews only when needed — can cut pipeline costs by 60-80%.

Tip

In a wrapper script, set the default model based on the task category. Route formatting, linting, and boilerplate generation to Sonnet automatically. Reserve Opus for commands that explicitly request it or for tasks that require multi-file reasoning.

Effort Levels at Scale

The --effort flag controls how much thinking Claude does before responding. Lower effort means fewer output tokens, which directly reduces cost.

Effort Level Comparison

EffortBehaviorBest ForRelative Cost
lowMinimal thinking, short responsesLookups, simple transforms, yes/no questions~0.5x
mediumBalanced thinking and outputStandard code generation, explanations~1x (baseline)
highExtended reasoning, thorough outputComplex debugging, multi-step analysis~1.5-2x
maxMaximum reasoning depthHardest problems, architecture reviews~2-3x

At scale, effort tuning adds up. If 70% of your team’s calls are simple lookups or formatting tasks, routing them through --effort low can cut total output token costs nearly in half for those calls:

Terminal window
# Low effort for simple tasks
claude -p "What is the return type of this function?" --effort low
# Default effort for standard work
claude -p "Write unit tests for the auth module"
# Max effort for the hardest problems
claude -p "Diagnose why this distributed lock fails under contention" --effort max
Note

Effort affects output token count, not input tokens. The system prompt overhead is the same regardless of effort level. For calls dominated by input cost (large context, many files), effort tuning has minimal impact. It matters most for calls that generate long responses.

Cache Optimization

Caching is automatic in Claude CLI, but how you structure your workload determines whether you get 90% savings or pay full price on every call.

Batch Similar Prompts

Calls that share the same system prompt benefit from cache reads at one-tenth the input price. Run similar tasks together rather than interleaving different workloads:

Terminal window
# Good: batch similar tasks — second call gets cache hits
claude -p "Review src/auth.ts for security issues" --output-format json
claude -p "Review src/auth.ts for performance issues" --output-format json
# Wasteful: interleave different system contexts
claude -p "Review src/auth.ts" --output-format json
claude -p "Format this JSON" --model claude-sonnet-4-6 --output-format json
claude -p "Review src/db.ts" --output-format json

Maintain Sessions

Resuming a session is cheaper than starting fresh. Every resumed turn reads the accumulated context from cache at the 0.1x rate instead of re-ingesting it:

Terminal window
# Start a session
RESULT=$(claude -p "Analyze the codebase structure" --output-format json)
SESSION=$(echo "$RESULT" | jq -r '.session_id')
# Resume — context is cached, subsequent turns are cheaper
claude -p "Now focus on the API layer" --resume "$SESSION" --output-format json
claude -p "Suggest improvements" --resume "$SESSION" --output-format json

The Cache Math

Understanding the numbers helps you make better decisions about session management:

Cache Impact on a 10-Turn Opus Session

ScenarioCalculationCost
Fresh call each time10 x 15K input tokens at $15/M$2.25
All cached1 cache write + 9 cache reads$0.65
Savings71%

For a team of 10 developers each running 50 calls per day, the difference between cached and uncached workloads is roughly $240/day — over $5,000/month in savings from session management alone.

Disable Tools When Not Needed

Tool descriptions inflate the system prompt. For pure text tasks, stripping them out reduces input tokens:

Terminal window
claude -p "Summarize this document" --tools ""

This removes tool definitions from the system prompt, trimming the per-call input token count. The savings are modest per call but compound across thousands of daily invocations.

Gotcha

Cache entries have TTLs — 5 minutes for short-lived caches, 1 hour for extended caches. If you batch tasks with long gaps between them, the cache may expire and you pay full write cost again. Keep batch runs tight to maximize cache hits.