Tools
Simulator

What does one AI agent run really cost, and can a power user blow your budget?

An agent run costs far more than a single API call. The harness re-sends a growing transcript every step, retries repeat work, tool calls add tokens, and reasoning tokens are billed as output. On a context-accumulating harness, total cost grows roughly with the square of the step count; prompt caching pulls it back toward linear. A few heavy users can dominate your bill.

For builders shipping coding agents or agent loops who need a per-run and per-user cost forecast before launch, and anyone sizing the risk that power users or runaway loops quietly burn the budget.

Agent harness
Model
5) for Haiku (
/$5) can cut a token bill ~5×.
Task
4
50
50%
30%
1
Reasoning & volume
Estimated cost / agent run
.77
/ run
36 steps · 1.5M in · 68K out (40K reasoning)
Per run
.77
Claude Code
Monthly
77
at your run volume
Cache savings
67%
vs $5.41 uncached
Why is this run expensive?
Reasoning$0.599 · 34%
Output$0.417 · 24%
Cache reads$0.409 · 23%
Cache writes$0.247 · 14%
Uncached input$0.097 · 5%
Cost distribution (P50 / P90 / P99)
stochastic
The same task can vary up to ~30× in tokens, so the cost is a band, not a point. P50 is a typical run; budget against P90.
P50 typical
.77
P90 budget
$7.08
P99 worst case
6.56
End-to-end reliability18% · retry ×1.82
Expected monthly (incl. retries)$323
API monthly
77
3-year TCO (~25% API)$708/mo
Cost per step
Context accumulates — watch input climb across steps (caching keeps it in check).
Step
SEED coefficients — a directional estimate, not calibrated to real runs.
Caching assumes a stable prompt prefix; any non-determinism busts it and raises cost.
Cost is stochastic — the P50/P90/P99 band reflects how the same task can vary up to ~30× in tokens.
Pricing validated to the cent against real Claude Code logs (ccusage/LiteLLM). In aggregate, real usage runs ~$0.68–0.94 / 1M tokens blended with ~98% cache reads; a single short task reads higher until cache reads accumulate.
Priced on Claude Sonnet 4.5 (Anthropic) at $3.00 in ·
5.00 out / 1M.

Example scenario

The same task can cost very differently by harness. A context accumulator (Claude Code style) re-sends the whole transcript each step, so a long run climbs fast; a windowed harness keeps a bounded slice and a compressed harness sends summaries. Turn prompt caching on and the accumulator's bill drops sharply, because the repeated prefix is charged at the cache-read rate. The simulator shows both the cached cost and the uncached counterfactual.

What the inputs mean

  • Harness type: how it builds context, accumulating, windowed, or compressed.
  • Model: sets the token and cache rates.
  • Task size and steps: how long the run is.
  • Prompt caching: read and write behavior for the repeated prefix.
  • Retries: repeated work when a step fails.
  • Reasoning: hidden tokens that are billed as output.

What the result means

You get a per-step and total forecast in tokens and dollars, the cached cost next to the uncached counterfactual so you can see what caching is doing, and the reasoning-token share that is invisible in the response but still billed.

Assumptions

  • Step-count and token-shape coefficients are seed values, directional until you tune them to your own runs.
  • The underlying prices and the cache model (read about 0.1 times input, write about 1.25 times input) are validated against real logs.
  • In aggregate usage, cache reads are about 98 percent of all tokens, so a long run's blended rate sits well below list input, around $0.68 to $0.94 per 1M tokens across viberank, clawdboard, and local ccusage.
  • A single short task reads a little higher before cache reads pile up.

Where the prices come from

Per-token and cache read/write rates come from the source-backed pricing index, where every figure links to the provider's own page and carries a last-checked date. This tool reads those committed numbers; it never calls a provider or fetches live prices.

How the calculation works

Token price is the same wherever you call a model; what differs is how much the harness sends. Each step bills the context it re-sends plus new output and any reasoning tokens. Caching charges the repeated prefix at the cheaper cache-read rate after a one-time write, which is why an accumulating harness with caching grows closer to linear than quadratic. Retries multiply a step's cost. The forecast combines these drivers; it does not change the model's published rates.

Frequently asked questions

Why does an agent cost so much more than a single chat?
Because the harness re-sends a growing context every step and adds retries, tool calls, and reasoning tokens. A ten-step run can bill many times the tokens of one call, especially without caching.
How much does prompt caching save on an agent run?
A lot on long runs, because the repeated prefix is charged at the cache-read rate (about a tenth of input) after a one-time write premium. The simulator shows the cached cost beside the uncached counterfactual.
What is the power-user risk?
A small number of users running long or looping agents can dominate spend. Forecasting a heavy run shows whether your pricing or caps survive that, before it shows up on the invoice.
Can I trust the forecast?
The prices and cache model are validated against real public usage logs; the step-count and token-shape coefficients are directional seed values until you calibrate them to your own runs. Treat it as a sized estimate, not a guarantee.

Pricing data last checked 2026-06-01. Rates are read from the source-backed pricing index and its change history. This tool never calls a provider or fetches live prices.

Plan AI and cloud spend before it lands.

Open the pricing index, then use the calculators to model your real workload.

For Engineering

Model costs by token, understand the economics of feature complexity.

For Finance

Budget forecasting and vendor negotiation with live pricing updates.

For Product

Compare models, simulate scenarios, monitor pricing changes in real time.

Browse tools
ByteCosts

Cost intelligence for AI, cloud, and SaaS. Public pricing, normalized into an index and calculators that engineering and finance can use in the same room.

Catalog: 137 providers · 4,993 models · updated Jun 1, 2026

Prices via models.dev and custom scrapers · model quality benchmarks via Artificial Analysis

Disclaimer: All information provided is for reference purposes only. Actual costs may vary based on usage patterns and provider terms. Always monitor your own token consumption and billing dashboard to track real expenses.

© 2026 ByteCosts. All rights reserved.
Built on public pricing data and browser-side calculators. Figures are directional.