If you're building with Claude, the pricing page alone won't tell you what you actually need to know. This post breaks down every model, every discount mechanism, and what real workloads actually cost — with no hand-waving.
The Short Answer
Claude API pricing is token-based. You pay per million tokens (MTok) for input and output separately. Costs range from $0.25/MTok input on the cheapest legacy model to $5/MTok input on Claude Opus 4.6. Output tokens cost more than input tokens across all models.
The model you pick matters enormously. Haiku 4.5 costs 20x less than Opus 4.6 for the same token volume.
Current Claude API Pricing by Model
All prices are in USD per million tokens (MTok), as of mid-2025.
Current Generation Models
| Model | Input ($/MTok) | Output ($/MTok) | Context Window |
|---|---|---|---|
| claude-opus-4-6 | $5.00 | $25.00 | 1M tokens |
| claude-opus-4-5 | $5.00 | $25.00 | 200K tokens |
| claude-sonnet-4-6 | $3.00 | $15.00 | 1M tokens |
| claude-sonnet-4-5 | $3.00 | $15.00 | 200K tokens |
| claude-haiku-4-5 | $1.00 | $5.00 | 200K tokens |
Legacy Models (Still Available)
| Model | Input ($/MTok) | Output ($/MTok) |
|---|---|---|
| claude-haiku-3-5 | $0.80 | $4.00 |
| claude-haiku-3 | $0.25 | $1.25 |
| claude-opus-4 | $15.00 | $75.00 |
Opus 4 vs. Opus 4.6 is not a typo. The newer "4.6" generation models are dramatically cheaper than the original Opus 4. Anthropic significantly restructured pricing with the 4.5/4.6 generation — Opus 4.6 at $5/MTok input vs. Opus 4 at $15/MTok input for the same capability tier. Sonnet 4.6 and Opus 4.6 also include a full 1M token context window at standard price.
The long-context surcharge ($6/MTok input, $22.50/MTok output) only kicks in above 200K input tokens per request. Most applications don't hit this.
The Two Biggest Levers: Batch API and Prompt Caching
Before you worry about model selection, understand these two discounting mechanisms. They can cut your bill by 50–90%.
Batch API (50% off)
The Batch API processes requests asynchronously. You submit a batch, results come back within 24 hours. For any workload that isn't user-facing and real-time, this is the easiest way to halve your cost.
| Model | Batch Input ($/MTok) | Batch Output ($/MTok) |
|---|---|---|
| claude-opus-4-6 | $2.50 | $12.50 |
| claude-sonnet-4-6 | $1.50 | $7.50 |
| claude-haiku-4-5 | $0.50 | $2.50 |
| claude-haiku-3 (batch) | $0.125 | $0.625 |
Prompt Caching (up to 90% off on re-read)
If your requests share a large, repeated prefix — a system prompt, a reference document, a codebase — prompt caching lets the API reuse that processed context instead of re-tokenizing it every call.
Cache writes cost slightly more upfront (1.25x for a 5-minute cache, 2x for a 1-hour cache), but cache reads cost just 10% of normal input price. Break-even happens after one re-read on 5-minute caches, two re-reads on 1-hour caches. In most real applications with repeated system prompts, caching pays for itself within a single session.
These two discounts stack. Batch processing + prompt caching can reduce effective input costs by 95% compared to naive API usage. This is not a rounding error — it's the difference between a $500/month bill and a $25/month bill for the same workload.
Real-World Cost Examples
Abstract pricing is less useful than seeing what actual applications cost. Here are three representative scenarios.
Scenario 1: Customer-Facing Chatbot at 1M Tokens/Day
Setup: A support chatbot. Each conversation averages 2,000 input tokens (system prompt + conversation history) and 500 output tokens. You handle roughly 400 conversations per day. That's 800K input tokens and 200K output tokens per day — approximately 1M total.
| Approach | Daily cost | Monthly cost |
|---|---|---|
| Sonnet 4.6, no caching | ~$5.40 | ~$162 |
| Sonnet 4.6 + prompt caching | ~$3.78 | ~$113 |
With prompt caching (assuming 1,500 of 2,000 input tokens are a shared system prompt): cache reads run at $0.30/MTok vs. $3.00/MTok — roughly 30% reduction on a real-time conversational workload.
Scenario 2: AI Coding Assistant
Setup: A coding assistant embedded in an IDE. Each interaction sends ~4,000 tokens of context (open files, instructions, conversation) and generates ~800 tokens of code. 50 interactions per day, 20 developers on the team = 1,000 requests/day: 4M input tokens, 800K output tokens.
| Model | Daily cost | Monthly cost |
|---|---|---|
| Claude Sonnet 4.6 | $24.00 | $720 |
| Claude Haiku 4.5 | $8.00 | $240 |
Model selection alone is a 3x cost difference here. Profile what your task actually needs before defaulting to Sonnet — for autocomplete-style tasks, Haiku is usually sufficient.
Scenario 3: Batch Document Processing
Setup: Nightly job that classifies and summarizes 10,000 documents. Average document is 800 input tokens; output is 150 tokens of structured summary. Not time-sensitive — results needed by morning.
| Approach | Per nightly run | Monthly |
|---|---|---|
| Sonnet 4.6, real-time | ~$136 | ~$4,080 |
| Haiku 4.5, Batch API | ~$7.75 | ~$232 |
The Batch API + Haiku combination runs this job at roughly 5% of what naive real-time Sonnet would cost. That's not a rounding error — it's a real architectural choice worth making at design time.
How to Actually Reduce Your Claude API Spend
- Start with Haiku, escalate if needed. Most routing and classification tasks, short-context completions, and structured extraction don't require Sonnet-level capability. Build with Haiku first. Upgrade specific flows that actually need it after you've benchmarked quality.
- Cache your system prompt. If you have a system prompt longer than ~500 tokens that's consistent across requests, add a
cache_controlblock. It takes 10 minutes to implement and reduces those tokens to 10% cost on every subsequent call. - Route to Batch for anything async. Data pipelines, report generation, overnight jobs, test suite evaluation — none of these need real-time responses. The Batch API's 50% discount is automatic. There's no good reason to pay full price for non-interactive workloads.
- Trim your context aggressively. Every token in your input costs money. Audit your prompts: remove redundant instructions, truncate conversation history appropriately, and avoid including full documents when relevant snippets will do.
- Monitor per-request token counts during development. The API response includes
usage.input_tokensandusage.output_tokens. Log these in dev. It's common to discover that a "simple" feature is sending 5K tokens per call because of an accidentally included debug payload or unexpectedly long tool schema.
Tracking Costs in Production
The Anthropic Console gives you top-level usage dashboards, but it doesn't give you per-feature, per-user, or per-model breakdown out of the box. For teams that need cost attribution across multiple features or are watching spend against a budget, that gap gets painful quickly.
At the application layer, you should be logging token counts per request alongside request metadata (user ID, feature name, model used). From there you can build spend tracking in your data warehouse or use a purpose-built tool like ClawCost, which handles the aggregation and alerting without requiring you to instrument everything from scratch.
The point is: don't wait until your monthly bill arrives to understand your cost breakdown. Token usage patterns during development often look nothing like production traffic.
Special Cases to Know About
- Tool use adds tokens. When you include tools in a request, Claude automatically adds a system prompt to enable them — roughly 313–346 extra input tokens depending on the tool choice mode. Tool definitions themselves also count as input tokens. In heavily agentic applications with large tool schemas, this overhead can be substantial.
- Web search costs extra. The built-in web search tool costs $10 per 1,000 searches on top of token costs. If you're building agents that search frequently, budget for this separately.
- Third-party platforms have their own pricing. Claude via AWS Bedrock or Google Vertex AI has different pricing, including regional vs. global endpoint premiums (regional is 10% more). Check those platforms' pricing pages directly rather than assuming parity with the direct API.
- Enterprise pricing is negotiable. At high volume, Anthropic's published rates aren't necessarily the floor. Custom pricing is available through their sales team.
Quick Reference: Which Model Should I Use?
| Use Case | Recommended Model | Rough Cost Range |
|---|---|---|
| High-stakes reasoning, complex agents | claude-opus-4-6 | $5–$25/MTok |
| General-purpose dev/prod workloads | claude-sonnet-4-6 | $3–$15/MTok |
| High-volume classification, extraction | claude-haiku-4-5 | $1–$5/MTok |
| Ultra-high-volume batch, cost-sensitive | claude-haiku-3 (batch) | $0.125–$0.625/MTok |
The Bottom Line
Claude API pricing in 2025 is genuinely competitive at every tier, but the variance between "naive usage" and "optimized usage" is enormous. A team spending $500/month could often run the same workload for $50–$100 with the right model selection, prompt caching, and Batch API routing.
The decisions that matter most:
- Model selection — Haiku vs. Sonnet vs. Opus is often a 3–20x cost difference.
- Batch vs. real-time — a 50% cut on non-interactive work.
- Prompt caching — 90% reduction on repeated context.
Get those three right and you've done the heavy lifting. After that, logging token counts per feature and watching spend over time keeps surprises off your bill.
Pricing sourced from the Anthropic API documentation. Prices are subject to change — verify current rates before making financial projections.
Stop guessing where your API spend is going
ClawCost tracks every token — broken down by model, session, and project — in real time. Self-hosted, open source, free to start.
Get started free → See pricing