Example scenario
Imagine a large system prompt reused across many requests. The first call pays a write premium to cache it; each later call reads it at a fraction of the input rate. If you reuse it only a couple of times, the premium is not repaid and you spend more. Reuse it often, with a high hit rate, and the cached portion drops toward the cache-read rate. The tool shows the exact reuse count where you cross from loss to savings.
What the inputs mean
- Cached prefix size: the stable tokens you would cache.
- Model: sets the input, cache-read, and cache-write rates.
- Hit rate or reuse count: how often the cached prefix is reused.
- Cache window: the time-to-live tier you use.
What the result means
You get net savings or loss versus not caching, the break-even reuse count, and the effective cost per request once caching is applied, so you can tell whether your reuse rate clears the premium.
Assumptions
- Cache-read rates are roughly a tenth of input and write premiums roughly 1.25 to 2 times input, per the providers' published tiers.
- A cached entry only helps within its time-to-live window.
- Hit rate is your estimate of how often the prefix is actually reused before it expires.
- Only the cached prefix is discounted, not the variable part of each request.
Where the prices come from
Input and cache read/write rates come from the source-backed pricing index, where every figure links to the provider's own page and carries a last-checked date. This tool reads those committed numbers; it never calls a provider or fetches live prices.
How the calculation works
Each cache read saves the difference between the input rate and the cache-read rate; the first write costs a one-time premium over the input rate. Net savings are the per-read saving times the number of reuses, minus the write premium. Break-even reuse is the write premium divided by the per-read saving: once you reuse the prefix more than that, caching pays off. The tool uses the providers' published cache tiers; it does not change any rate.