Value ranking

Benchmark-Adjusted Value Calculator

Rank capable models by cost per benchmark point for chat, coding, RAG, batch, and vision workloads.

Quality-adjusted value

Workload

Large repo context, tool-call results, long generations and reasoning. Cursor / Claude Code style — modern agent loops routinely send 30k+ input and emit 5k+ output per call.

Calls / month8,000

Best value model

Step 3.5 Flash

$38.40 / month at 31.6 benchmark points

Value winner

.22

cost / point

Monthly volume

8.0K

32K in / call

Quality leaders

GPT-5.5

highest benchmark score

Model	Score	Monthly	Cost / point
Step 3.5 Flash StepFun (China)	31.6	$38.40	.22
MiMo-V2-Flash Xiaomi	33.5	$40.00	.19
DeepSeek V4 Flash DeepSeek	39.8	$49.28	.24
DeepSeek V3.2 Exp Alibaba (China)	36.7	$94.16	.57
Mercury 2 Inception	30.6	00	$3.27
GPT-5.4 nano OpenAI	43.9	11	.53
MiniMax-M2.5 Alibaba (China)	37.4	34	$3.59
MiniMax-M2.7 Alibaba (China)	41.9	34	$3.21
MiniMax-M2.1 MiniMax (minimax.io)	32.8	34	$4.10
Qwen3.6 35B-A3B Alibaba	35.2	35	$3.83

The score is the Artificial Analysis index selected by the workload preset. A cheap model only appears when it clears the preset capability and benchmark threshold.