Tools
Calculator

Does prompt caching cut your LLM bill, or does the write premium cost you more?

Prompt caching saves money only when a cached prefix is reused enough to offset its one-time write premium. Below the break-even reuse count you pay more than without caching; above it you save, approaching the cache-read discount (often around ninety percent on the cached portion) as your hit rate rises.

For teams with large, stable prompt prefixes (system prompts, retrieved context, few-shot examples) deciding whether caching is worth the write premium at their reuse rate.

Your workload
5) for Haiku (
/$5) can cut a token bill ~5×.
$3.00 in ·
5.00 out · read $0.300 · write $3.75 / 1M
2,000

System prompt + tool schemas, sent every request

6,000

Conversation history or retrieved docs reused across requests

500

The new turn — never cacheable

600
70%

Share of requests that reuse a warm prefix

1,000
Cache TTL (write tier)
Estimated monthly cache savings
$400
/ month
39%
across 30,000 requests/mo
Without cache
,035
/ month
With cache
$635
/ month
Break-even reuse
0.3×
reads per write
Cost component (with cache)Monthly
Cache reads
Cached prefix on a hit
$50.40
Cache writes
Prefix (re)written on a miss
70
Fresh input
New, non-cacheable tokens
$45.00
Output
Generated tokens
70
Effective input price
.43 / 1M tokens — vs $3.00 list (48% of list).
Steady-state model: a hit reads the cacheable prefix at the read rate, a miss (re)writes it at the 1.25× write rate. Write tiers are derived from the input rate (datasets publish cache reads, not writes). 30-day month; actual cost varies with prefix stability and retries.

When does prompt caching pay off?

Caching wins when a large, stable prefix — your system prompt, tool definitions, or retrieved context — is reused across many requests. A cache read costs about a tenth of the input rate, but a write costs more than a fresh send (1.25× for a 5-minute cache, 2× for an hour), so the prefix has to be reused enough to earn that write back. The break-even point is H* = (P_write − P_input) / (P_input − P_read) — often less than one reuse for the 5-minute tier, meaning the very first hit already pays for the write.

Push the cache hit rate down and you can watch savings turn negative: when most requests miss, you pay the write premium without ever collecting the read discount. That is the failure mode this calculator is meant to surface before you turn caching on.

Example scenario

Imagine a large system prompt reused across many requests. The first call pays a write premium to cache it; each later call reads it at a fraction of the input rate. If you reuse it only a couple of times, the premium is not repaid and you spend more. Reuse it often, with a high hit rate, and the cached portion drops toward the cache-read rate. The tool shows the exact reuse count where you cross from loss to savings.

What the inputs mean

  • Cached prefix size: the stable tokens you would cache.
  • Model: sets the input, cache-read, and cache-write rates.
  • Hit rate or reuse count: how often the cached prefix is reused.
  • Cache window: the time-to-live tier you use.

What the result means

You get net savings or loss versus not caching, the break-even reuse count, and the effective cost per request once caching is applied, so you can tell whether your reuse rate clears the premium.

Assumptions

  • Cache-read rates are roughly a tenth of input and write premiums roughly 1.25 to 2 times input, per the providers' published tiers.
  • A cached entry only helps within its time-to-live window.
  • Hit rate is your estimate of how often the prefix is actually reused before it expires.
  • Only the cached prefix is discounted, not the variable part of each request.

Where the prices come from

Input and cache read/write rates come from the source-backed pricing index, where every figure links to the provider's own page and carries a last-checked date. This tool reads those committed numbers; it never calls a provider or fetches live prices.

How the calculation works

Each cache read saves the difference between the input rate and the cache-read rate; the first write costs a one-time premium over the input rate. Net savings are the per-read saving times the number of reuses, minus the write premium. Break-even reuse is the write premium divided by the per-read saving: once you reuse the prefix more than that, caching pays off. The tool uses the providers' published cache tiers; it does not change any rate.

Frequently asked questions

When does prompt caching actually save money?
When you reuse a cached prefix more times than the break-even reuse count. Large, stable prompts reused frequently save the most; rarely reused prompts can cost more because of the write premium.
What is the cache write premium?
Caching charges a one-time premium (roughly 1.25 to 2 times the input rate, depending on the time-to-live tier) to store a prefix. Later reads are much cheaper, so the premium pays back only with enough reuse.
How many reuses do I need to break even?
The break-even reuse is the write premium divided by what each read saves. Enter your prefix size and model and the tool shows the exact number for your case.
Does caching work the same across providers?
The mechanics are similar, a cheap read after a pricier write, but the exact discounts and time-to-live windows differ by provider. The tool uses each provider's published cache rates.

Pricing data last checked 2026-06-01. Rates are read from the source-backed pricing index and its change history. This tool never calls a provider or fetches live prices.

Plan AI and cloud spend before it lands.

Open the pricing index, then use the calculators to model your real workload.

For Engineering

Model costs by token, understand the economics of feature complexity.

For Finance

Budget forecasting and vendor negotiation with live pricing updates.

For Product

Compare models, simulate scenarios, monitor pricing changes in real time.

Browse tools
ByteCosts

Cost intelligence for AI, cloud, and SaaS. Public pricing, normalized into an index and calculators that engineering and finance can use in the same room.

Catalog: 137 providers · 4,993 models · updated Jun 1, 2026

Prices via models.dev and custom scrapers · model quality benchmarks via Artificial Analysis

Disclaimer: All information provided is for reference purposes only. Actual costs may vary based on usage patterns and provider terms. Always monitor your own token consumption and billing dashboard to track real expenses.

© 2026 ByteCosts. All rights reserved.
Built on public pricing data and browser-side calculators. Figures are directional.