When does prompt caching actually save money?

When you reuse a cached prefix more times than the break-even reuse count. Large, stable prompts reused frequently save the most; rarely reused prompts can cost more because of the write premium.

What is the cache write premium?

Caching charges a one-time premium (roughly 1.25 to 2 times the input rate, depending on the time-to-live tier) to store a prefix. Later reads are much cheaper, so the premium pays back only with enough reuse.

How many reuses do I need to break even?

The break-even reuse is the write premium divided by what each read saves. Enter your prefix size and model and the tool shows the exact number for your case.

Does caching work the same across providers?

The mechanics are similar, a cheap read after a pricier write, but the exact discounts and time-to-live windows differ by provider. The tool uses each provider's published cache rates.

Tools

Calculator

Does prompt caching cut your LLM bill, or does the write premium cost you more?

Prompt caching saves money only when a cached prefix is reused enough to offset its one-time write premium. Below the break-even reuse count you pay more than without caching; above it you save, approaching the cache-read discount (often around ninety percent on the cached portion) as your hit rate rises.

For teams with large, stable prompt prefixes (system prompts, retrieved context, few-shot examples) deciding whether caching is worth the write premium at their reuse rate.

Your workload

Model

5) for Haiku (

/$5) can cut a token bill ~5×.

$3.00 in ·

5.00 out · read $0.300 · write $3.75 / 1M

Cacheable system + tools2,000

System prompt + tool schemas, sent every request

Reused context / history6,000

Conversation history or retrieved docs reused across requests

Fresh input / request500

The new turn — never cacheable

Output / request600

Cache hit rate70%

Share of requests that reuse a warm prefix

Requests / day1,000

Cache TTL (write tier)

Estimated monthly cache savings

$400

/ month

−39%

across 30,000 requests/mo

Without cache

,035

/ month

With cache

$635

/ month

Break-even reuse

0.3×

reads per write

Cost component (with cache)	Monthly	Share
Cache reads Cached prefix on a hit	$50.40	8%
Cache writes Prefix (re)written on a miss	70	42%
Fresh input New, non-cacheable tokens	$45.00	7%
Output Generated tokens	70	42%

Effective input price

.43 / 1M tokens — vs $3.00 list (48% of list).

Steady-state model: a hit reads the cacheable prefix at the read rate, a miss (re)writes it at the 1.25× write rate. Write tiers are derived from the input rate (datasets publish cache reads, not writes). 30-day month; actual cost varies with prefix stability and retries.

When does prompt caching pay off?

Caching wins when a large, stable prefix — your system prompt, tool definitions, or retrieved context — is reused across many requests. A cache read costs about a tenth of the input rate, but a write costs more than a fresh send (1.25× for a 5-minute cache, 2× for an hour), so the prefix has to be reused enough to earn that write back. The break-even point is H* = (P_write − P_input) / (P_input − P_read) — often less than one reuse for the 5-minute tier, meaning the very first hit already pays for the write.

Push the cache hit rate down and you can watch savings turn negative: when most requests miss, you pay the write premium without ever collecting the read discount. That is the failure mode this calculator is meant to surface before you turn caching on.

Example scenario

Imagine a large system prompt reused across many requests. The first call pays a write premium to cache it; each later call reads it at a fraction of the input rate. If you reuse it only a couple of times, the premium is not repaid and you spend more. Reuse it often, with a high hit rate, and the cached portion drops toward the cache-read rate. The tool shows the exact reuse count where you cross from loss to savings.

What the inputs mean

Cached prefix size: the stable tokens you would cache.
Model: sets the input, cache-read, and cache-write rates.
Hit rate or reuse count: how often the cached prefix is reused.
Cache window: the time-to-live tier you use.

What the result means

You get net savings or loss versus not caching, the break-even reuse count, and the effective cost per request once caching is applied, so you can tell whether your reuse rate clears the premium.

Assumptions

Cache-read rates are roughly a tenth of input and write premiums roughly 1.25 to 2 times input, per the providers' published tiers.
A cached entry only helps within its time-to-live window.
Hit rate is your estimate of how often the prefix is actually reused before it expires.
Only the cached prefix is discounted, not the variable part of each request.

Where the prices come from

Input and cache read/write rates come from the source-backed pricing index, where every figure links to the provider's own page and carries a last-checked date. This tool reads those committed numbers; it never calls a provider or fetches live prices.

How the calculation works

Each cache read saves the difference between the input rate and the cache-read rate; the first write costs a one-time premium over the input rate. Net savings are the per-read saving times the number of reuses, minus the write premium. Break-even reuse is the write premium divided by the per-read saving: once you reuse the prefix more than that, caching pays off. The tool uses the providers' published cache tiers; it does not change any rate.

Frequently asked questions

When does prompt caching actually save money?: When you reuse a cached prefix more times than the break-even reuse count. Large, stable prompts reused frequently save the most; rarely reused prompts can cost more because of the write premium.
What is the cache write premium?: Caching charges a one-time premium (roughly 1.25 to 2 times the input rate, depending on the time-to-live tier) to store a prefix. Later reads are much cheaper, so the premium pays back only with enough reuse.
How many reuses do I need to break even?: The break-even reuse is the write premium divided by what each read saves. Enter your prefix size and model and the tool shows the exact number for your case.
Does caching work the same across providers?: The mechanics are similar, a cheap read after a pricier write, but the exact discounts and time-to-live windows differ by provider. The tool uses each provider's published cache rates.

Pricing data last checked 2026-06-01. Rates are read from the source-backed pricing index and its change history. This tool never calls a provider or fetches live prices.