How much does the OpenAI GPT-5.4 API cost?

GPT-5.4 API pricing is $2.50 per million input tokens and $15.00 per million output tokens. Use our calculator at aiapicost.com for exact cost estimates based on your usage.

Which AI model is cheapest for API usage?

The cheapest AI API models change frequently. Use aiapicost.com to compare real-time pricing across 400+ models from OpenAI, Anthropic, Google, DeepSeek, and more. DeepSeek and open-source models typically offer the lowest per-token costs.

How do AI API token costs work?

AI APIs charge per token (roughly 0.75 words). Costs are split into input tokens (what you send) and output tokens (what the model generates). Output tokens are typically 2-5x more expensive. Prices are quoted per 1 million tokens.

Claude vs ChatGPT: which is better?

Both are top-tier models. Claude excels at coding and instruction-following, while GPT-5.4 offers broader multimodal capabilities. Compare them head-to-head at aiapicost.com/compare with real benchmark data.

Fastest AI Models 2026 — Live Speed Leaderboard (Mercury 2, 1,101 tok/s)

Compare real-world speed for 540+ AI models: response latency, time to first token (TTFT), and throughput (tokens/sec). We pull live performance data from Artificial Analysis and pair it with per-token pricing so you can spot the fastest and most cost-efficient models for production APIs, real-time chat, and code completion.

For raw benchmark scores (GPQA, AIME, MMLU-Pro, HLE, SWE-bench) see the Benchmarks leaderboard. For pricing-only comparison try the LLM cost calculator.

Top 5 Fastest AI Models Right Now

1.Mercury 21194.3 tok/s
2.LFM2.5-VL-1.6B455.0 tok/s
3.Granite 4.0 H Small396.8 tok/s
4.Step 3.7 Flash393.2 tok/s
5.Granite 3.3 8B (Non-reasoning)354.3 tok/s

Live speed data from Artificial Analysis API

AI Model Speed Rankings

Compare 540+ AI models by response speed, latency, and throughput. Find the fastest models for your use case.

540 models · click headers to sort

Model

Throughput ↓

TTFT

$/1M

$/Speed

Price×TTFT

Mercury 2

Inception

1194 t/s

3.02s

$0.38

$0.000

1130.6

LFM2.5-VL-1.6B

Liquid AI

455 t/s

8.73s

$0.00

—

Granite 4.0 H Small

IBM

397 t/s

8.73s

$0.11

$0.000

934.4

Step 3.7 Flash

StepFun

393 t/s

749ms

$0.44

$0.001

328.1

Granite 3.3 8B (Non-reasoning)

IBM

354 t/s

21.19s

$0.09

$0.000

1801.2

HyperNova 60B 2605

Multiverse Computing

351 t/s

774ms

$0.07

$0.000

50.3

LFM2.5-8B-A1B

Liquid AI

342 t/s

10.24s

$0.00

—

Nova Micro

Amazon

340 t/s

625ms

$0.06

$0.000

38.1

gpt-oss-120b (low)

OpenAI

329 t/s

502ms

$0.26

$0.001

131.5

Gemini 3.1 Flash-Lite

Google

311 t/s

5.11s

$0.56

$0.002

2878.1

Llama 3.1 Nemotron Instruct 70B

NVIDIA

307 t/s

4.03s

$1.2

$0.004

4836.0

NVIDIA Nemotron Nano 12B v2 VL (Reasoning)

NVIDIA

297 t/s

237ms

$0.30

$0.001

71.1

Nemotron 3 Nano Omni 30B A3B Reasoning

NVIDIA

288 t/s

567ms

$0.13

$0.000

74.3

gpt-oss-120b (high)

OpenAI

271 t/s

506ms

$0.26

$0.001

132.6

Gemini 2.5 Flash-Lite (Reasoning)

Google

263 t/s

24.82s

$0.17

$0.001

4344.4

Qwen3.5 Omni Flash

Alibaba

246 t/s

1.06s

$0.28

$0.001

292.6

Gemini 3.5 Flash (high)

Google

243 t/s

13.94s

$3.4

$0.014

47050.9

Nova 2.0 Lite (Non-reasoning)

Amazon

237 t/s

840ms

$0.85

$0.004

714.0

Sarvam 30B (high)

Sarvam

236 t/s

1.15s

$0.05

$0.000

54.0

o3-mini

OpenAI

232 t/s

5.54s

$1.9

$0.008

10668.4

Gemini 2.5 Flash (Reasoning)

Google

229 t/s

13.84s

$0.85

$0.004

11761.5

gpt-oss-20b (low)

OpenAI

227 t/s

492ms

$0.10

$0.000

46.7

Gemini 2.5 Flash-Lite (Non-reasoning)

Google

222 t/s

298ms

$0.17

$0.001

52.1

NVIDIA Nemotron Nano 12B v2 VL (Non-reasoning)

NVIDIA

218 t/s

767ms

$0.30

$0.001

230.1

o3-mini (high)

OpenAI

217 t/s

16.68s

$1.9

$0.009

32103.2

Gemini 3.5 Flash (minimal)

Google

214 t/s

698ms

$3.4

$0.016

2355.8

Gemini 3.5 Flash (medium)

Google

213 t/s

12.22s

$3.4

$0.016

41252.6

Gemini 2.5 Flash (Non-reasoning)

Google

211 t/s

403ms

$0.85

$0.004

342.6

GPT-5.1 Codex mini (high)

OpenAI

211 t/s

6.24s

$0.69

$0.003

4291.1

Gemini 3 Flash Preview (Reasoning)

Google

209 t/s

5.83s

$1.1

$0.005

6564.4

Hy3-preview (Reasoning)

Tencent

207 t/s

1.79s

$0.20

$0.001

357.8

Gemini 3 Flash Preview (Non-reasoning)

Google

207 t/s

686ms

$1.1

$0.005

771.8

GPT-4o (Nov '24)

OpenAI

207 t/s

497ms

$4.4

$0.021

2174.4

Trinity Large Thinking

Arcee AI

202 t/s

682ms

$0.40

$0.002

269.4

Qwen3.7 Max

Alibaba

201 t/s

1.55s

$3.8

$0.019

5808.8

GPT-5.1 Codex (high)

OpenAI

201 t/s

4.71s

$3.4

$0.017

16186.1

Nemotron 3 Ultra 550B A55B (Reasoning)

NVIDIA

197 t/s

923ms

$1.2

$0.006

1084.5

gpt-oss-20b (high)

OpenAI

197 t/s

457ms

$0.09

$0.000

40.2

GLM-5.2 (max)

Z AI

189 t/s

908ms

$2.1

$0.011

1952.2

Hy3-preview (Non-reasoning)

Tencent

189 t/s

1.87s

$0.20

$0.001

374.2

Qwen3 Next 80B A3B Instruct

Alibaba

188 t/s

1.12s

$0.88

$0.005

983.5

Ling 2.6 Flash

InclusionAI

182 t/s

695ms

$0.15

$0.001

104.3

GPT-5.4 (xhigh)

OpenAI

182 t/s

130.02s

$5.6

$0.031

731379.4

Jamba 1.6 Mini

AI21 Labs

181 t/s

788ms

$0.25

$0.001

197.0

Command A+

Cohere

179 t/s

180ms

$0.00

—

Step 3.5 Flash

StepFun

178 t/s

821ms

$0.15

$0.001

123.1

GPT-5.4 mini (xhigh)

OpenAI

178 t/s

11.20s

$1.7

$0.009

18910.7

Step 3.5 Flash 2603

StepFun

177 t/s

871ms

$0.15

$0.001

130.6

Qwen3.5 35B A3B (Non-reasoning)

Alibaba

177 t/s

1.27s

$0.69

$0.004

873.1

Nova Lite

Amazon

174 t/s

658ms

$0.10

$0.001

69.1

Nova 2.0 Lite (low)

Amazon

173 t/s

4.77s

$0.85

$0.005

4052.0

Mistral Small (Sep '24)

Mistral

173 t/s

580ms

$0.30

$0.002

174.0

Nova 2.0 Lite (medium)

Amazon

172 t/s

17.46s

$0.85

$0.005

14842.7

GPT-5 (ChatGPT)

OpenAI

171 t/s

494ms

$3.4

$0.020

1698.4

GPT-5 Codex (high)

OpenAI

169 t/s

7.83s

$3.4

$0.020

26923.0

Mistral Small 3

Mistral

169 t/s

519ms

$0.15

$0.001

77.9

Mistral Small 4 (Reasoning)

Mistral

168 t/s

561ms

$0.26

$0.002

147.0

Kimi K2 Thinking

Kimi

168 t/s

786ms

$1.1

$0.006

844.9

GPT-5.4 mini (medium)

OpenAI

168 t/s

10.71s

$1.7

$0.010

18071.7

Mistral Small (Feb '24)

Mistral

167 t/s

535ms

$1.5

$0.009

802.5

Ministral 3 3B

Mistral

167 t/s

420ms

$0.10

$0.001

42.0

GPT-4.1 nano

OpenAI

166 t/s

521ms

$0.17

$0.001

91.2

Nova 2.0 Lite (high)

Amazon

166 t/s

23.42s

$0.85

$0.005

19909.5

Grok 4.20 0309 v2 (Reasoning)

SpaceXAI

165 t/s

17.66s

$3.0

$0.018

52974.0

Mistral Small 4 (Non-reasoning)

Mistral

165 t/s

547ms

$0.26

$0.002

143.3

Qwen3.5 122B A10B (Non-reasoning)

Alibaba

165 t/s

1.25s

$1.1

$0.007

1378.3

Qwen3.5 35B A3B (Reasoning)

Alibaba

164 t/s

1.23s

$0.69

$0.004

844.9

Mistral Small 3.1

Mistral

163 t/s

525ms

$0.15

$0.001

78.8

GPT-5.4 nano (Non-Reasoning)

OpenAI

162 t/s

584ms

$0.46

$0.003

270.4

GPT-5.4 mini (Non-Reasoning)

OpenAI

162 t/s

551ms

$1.7

$0.010

930.1

o4-mini (high)

OpenAI

157 t/s

17.81s

$1.9

$0.012

34291.9

NVIDIA Nemotron Nano 9B V2 (Non-reasoning)

NVIDIA

157 t/s

1.10s

$0.09

$0.001

94.9

Qwen3 30B A3B 2507 Instruct

Alibaba

156 t/s

1.02s

$0.35

$0.002

357.3

GPT-5.4 nano (medium)

OpenAI

156 t/s

3.77s

$0.46

$0.003

1747.8

GPT-5 nano (minimal)

OpenAI

156 t/s

691ms

$0.14

$0.001

95.4

GPT-5 nano (medium)

OpenAI

155 t/s

53.10s

$0.14

$0.001

7327.9

Llama 3.1 Instruct 8B

Speed Metrics Guide

Throughput (tokens/s)

Output generation speed in tokens per second. Higher is better.

Good: >50 t/s · Excellent: >100 t/s

Time to First Token (TTFT)

Delay before the first token appears. Lower is better.

Good: <500ms · Excellent: <200ms

Price/Performance

Cost efficiency ratios. Lower values indicate better value.

$/Speed: price per t/s · Price×TTFT: latency penalty

Compare pricing for all models side by side

Open AI API Cost Calculator →

Fastest AI Models 2026 — Live Speed Leaderboard (Mercury 2, 1,101 tok/s)

Top 5 Fastest AI Models Right Now

AI Model Speed Rankings

Speed Metrics Guide

Tools

Guides

Comparisons