How much does the OpenAI GPT-5.4 API cost?

GPT-5.4 API pricing is $2.50 per million input tokens and $15.00 per million output tokens. Use our calculator at aiapicost.com for exact cost estimates based on your usage.

Which AI model is cheapest for API usage?

The cheapest AI API models change frequently. Use aiapicost.com to compare real-time pricing across 400+ models from OpenAI, Anthropic, Google, DeepSeek, and more. DeepSeek and open-source models typically offer the lowest per-token costs.

How do AI API token costs work?

AI APIs charge per token (roughly 0.75 words). Costs are split into input tokens (what you send) and output tokens (what the model generates). Output tokens are typically 2-5x more expensive. Prices are quoted per 1 million tokens.

Claude vs ChatGPT: which is better?

Both are top-tier models. Claude excels at coding and instruction-following, while GPT-5.4 offers broader multimodal capabilities. Compare them head-to-head at aiapicost.com/compare with real benchmark data.

Live data · Updated hourly

PinchBench — Real-World AI Agent Benchmarks

How do AI models perform on real agent tasks? PinchBench scores 540+ models across coding, reasoning, tool use, and instruction following — with live pricing data.

Models Tested

540

Scenarios

Avg Score

33.5

Best Value

Qwen3.5 4B (Reasoning)

⭐ Overall

Balanced score across all agent capabilities

intelligence index (15%)coding index (15%)math index (10%)gpqa (10%)livecodebench (10%)ifbench (10%)tau2 (10%)terminalbench hard (10%)hle (10%)

🥇#171.9

Anthropic

Claude Fable 5 (Adaptive Reasoning, Max Effort, Opus 4.8 Fallback)

Price

$20.00

Speed

Efficiency

3.6

🥈#270.3

OpenAI

GPT-5.5 (xhigh)

Price

$11.25

Speed

Efficiency

6.3

🥉#369.1

SpaceXAI

Grok 4.5 (high)

Price

$3.00

Speed

—

Efficiency

23.0

#	Model	Score	Input $/M	Output $/M	Speed	TTFT	Efficiency
1	Claude Fable 5 (Adaptive Reasoning, Max Effort, Opus 4.8 Fallback) Anthropic	71.9	$10.00	$50.00	67	63.44s	3.6
2	GPT-5.5 (xhigh) OpenAI	70.3	$5.00	$30.00	71	52.79s	6.3
3	Grok 4.5 (high) SpaceXAI	69.1	$2.00	$6.00	—	—	23.0
4	GPT-5.2 (xhigh) OpenAI	68.7	$1.75	$14.00	85	121.19s	14.3
5	Claude Opus 4.8 (Adaptive Reasoning, Max Effort) Anthropic	68.5	$5.00	$25.00	62	25.18s	6.8
6	GPT-5.5 (high) OpenAI	68.5	$5.00	$30.00	72	20.80s	6.1
7	Gemini 3 Pro Preview (high) Google	67.5	$2.00	$12.00	—	—	15.0
8	Gemini 3.1 Pro Preview Google	67.3	$2.00	$12.00	130	23.89s	15.0
9	GPT-5.5 (medium) OpenAI	67.0	$5.00	$30.00	71	5.76s	6.0
10	GPT-5.4 (xhigh) OpenAI	67.0	$2.50	$15.00	182	130.02s	11.9
11	Gemini 3 Flash Preview (Reasoning) Google	66.6	$0.50	$3.00	209	5.83s	59.2
12	GLM-5.2 (max) Z AI	66.6	$1.40	$4.40	189	0.91s	31.0
13	Gemini 3.5 Flash (high) Google	65.8	$1.50	$9.00	243	13.94s	19.5
14	Qwen3.7 Max Alibaba	65.6	$2.50	$7.50	201	1.55s	17.5
15	Claude Opus 4.7 (Adaptive Reasoning, Max Effort) Anthropic	65.1	$5.00	$25.00	55	18.79s	6.5
16	Claude Opus 4.5 (Reasoning) Anthropic	64.6	$5.00	$25.00	68	10.72s	6.5
17	GPT-5 Codex (high) OpenAI	64.1	$1.25	$10.00	169	7.83s	18.7
18	Claude Sonnet 5 (Adaptive Reasoning, Max Effort) Anthropic	63.6	$2.00	$10.00	84	141.47s	15.9
19	GPT-5.3 Codex (xhigh) OpenAI	63.4	$1.75	$14.00	92	75.26s	13.2
20	GPT-5.2 (medium) OpenAI	63.2	$1.75	$14.00	—	—	13.1

💰 Best Cost Efficiency — Overall

Score per dollar (higher = better value). Only models with pricing data.

Qwen3.5 4B (Reasoning)

648.4$0.06

HyperNova 60B 2605

582.7$0.07

Qwen3.5 4B (Non-reasoning)

553.1$0.06

gpt-oss-20b (high)

494.3$0.09

NVIDIA Nemotron 3 Nano 30B A3B (Reasoning)

476.8$0.09

Sarvam 30B (high)

469.5$0.05

NVIDIA Nemotron Nano 9B V2 (Reasoning)

450.4$0.07

Gemma 3n E4B Instruct

418.5$0.03

MiMo-V2-Flash (Reasoning)

410.1$0.15

gpt-oss-20b (low)

406.0$0.10

⚡ Score vs Speed — Overall

Models in the top-right are both fast and capable.

Inception

Mercury 2

Score

45.8

Speed

1194

StepFun

Step 3.7 Flash

Score

50.3

Speed

393

Liquid AI

LFM2.5-VL-1.6B

Score

11.9

Speed

455

Multiverse Computing

HyperNova 60B 2605

Score

37.9

Speed

351

IBM

Granite 4.0 H Small

Score

16.8

Speed

397

OpenAI

gpt-oss-120b (low)

Score

40.6

Speed

329

Google

Gemini 3.1 Flash-Lite

Score

40.1

Speed

311

Google

Gemini 3.5 Flash (high)

Score

65.8

Speed

243

OpenAI

gpt-oss-120b (high)

Score

51.7

Speed

271

Liquid AI

LFM2.5-8B-A1B

Score

22.6

Speed

342

Frequently Asked Questions

What is PinchBench and how does it differ from traditional benchmarks?

PinchBench evaluates AI models on real-world agent tasks spanning coding, reasoning, tool use, and instruction following. Unlike academic benchmarks that test isolated capabilities, PinchBench combines multiple benchmark dimensions to reflect how models perform as autonomous agents in practical workflows.

Which scenarios does PinchBench test?

PinchBench covers 6 scenarios: Coding Agent (code generation, debugging, terminal use), Reasoning & Logic (math, science, multi-step problems), Instruction Following (format compliance, structured output), Research & Analysis (scientific reasoning, knowledge), Tool Use & Agentic (multi-turn orchestration, planning), and an Overall balanced score.

How are scores calculated?

Each scenario uses a weighted combination of relevant benchmarks. For example, Coding Agent combines LiveCodeBench, TerminalBench, SciCode, and the Artificial Analysis Coding Index. Scores are normalized to 0-100. Cost efficiency is calculated as score divided by price per million tokens.

Why do real-world results differ from academic benchmarks?

Academic benchmarks test specific skills in controlled conditions. Real agent tasks require combining multiple skills — a model might score well on individual benchmarks but struggle when tasks require coding + tool use + instruction following simultaneously. PinchBench's weighted scenario scores better approximate this combined performance.

How often is the data updated?

PinchBench data refreshes hourly from the Artificial Analysis API, ensuring you see the latest benchmark scores and pricing for all models.

Best for your use case·Model recommender·Compare models·Full benchmarks·Calculator

What PinchBench Tests

PinchBench — Real-World AI Agent Benchmarks

Claude Fable 5 (Adaptive Reasoning, Max Effort, Opus 4.8 Fallback)

GPT-5.5 (xhigh)

Grok 4.5 (high)

💰 Best Cost Efficiency — Overall

⚡ Score vs Speed — Overall

Frequently Asked Questions