Version 3 · Current

ChronoBench

Measuring how far different language models can progress in Chrono Trigger using a vision-based agent with evidence-gated checkpoints and a first-class exploration track.

Last updated: 2026-05-10 · 200-cycle budget · 8 primary + 10 secondary checkpoints.

Leaderboard

Model Provider Last Checkpoint 12345678 Cycles Stuck Skills Superego Secondary Ev. rejected Tokens In ms/cycle Est. Cost Date
google/gemini-3-flash-preview OpenRouter 1000ad_left 14 45 50 76 94 104 107 182 200* 33 8 34 2/10 4,553,617 12,228 $2.44 2026-04-20
google/gemini-3-flash-preview OpenRouter telepod_reached 14 29 70 82 176 185 188 200* 27 1 30 1/10 4,787,830 13,034 $2.59 2026-04-19
google/gemma-4-26b-a4b Local marle_met 23 40 52 136 200* 39 1 40 1/10 2,787,833 26,434 Free 2026-04-20
google/gemma-4-e4b Local fair_entered 19 34 119 200* 118 2 30 2,931,298 22,651 Free 2026-04-20
x-ai/grok-4.3 OpenRouter fair_entered 18 18 117 200* 30 33 1/10 2,422,871 34,489 $3.64 2026-05-09
openai/gpt-5.4-nano Local house_exit 48 58 134 200* 102 4 37 2,490,333 16,049 Free 2026-04-19
qwen/qwen3.6-35b-a3b OpenRouter house_exit 5 200* 13 20 2,537,849 41,733 $0.00 2026-04-20
google/gemini-3.1-flash-lite OpenRouter house_exit 18 33 200* 25 54 1/10 4,657,715 11,588 $1.24 2026-05-08
mistralai/mistral-medium-3-5 OpenRouter house_exit 18 18 200* 18 1 22 1/10 2,593,600 12,474 $4.27 2026-05-09
moonshotai/kimi-k2.6 OpenRouter 12 0 3 123,714 151,848 $0.18 2026-04-20

* = cycle budget exhausted

Cycles per checkpoint

0 40 80 121 161 201 Cycles Left Bedroom google/gemini-3-flash-preview: 14 14 google/gemma-4-26b-a4b: 23 23 google/gemma-4-e4b: 19 19 x-ai/grok-4.3: 18 18 openai/gpt-5.4-nano: 48 48 google/gemini-3.1-flash-lite: 18 18 mistralai/mistral-medium-3-5: 18 18 Exited House google/gemini-3-flash-preview: 45 45 google/gemma-4-26b-a4b: 40 40 google/gemma-4-e4b: 34 34 x-ai/grok-4.3: 18 18 openai/gpt-5.4-nano: 58 58 qwen/qwen3.6-35b-a3b: 5 google/gemini-3.1-flash-lite: 33 33 mistralai/mistral-medium-3-5: 18 18 Reached the Fair google/gemini-3-flash-preview: 50 50 google/gemma-4-26b-a4b: 52 52 google/gemma-4-e4b: 119 119 x-ai/grok-4.3: 117 117 openai/gpt-5.4-nano: 134 134 Met Marle google/gemini-3-flash-preview: 76 76 google/gemma-4-26b-a4b: 136 136 Telepod Demo Announced google/gemini-3-flash-preview: 94 94 Passed Candy Gate google/gemini-3-flash-preview: 104 104 Reached Telepod google/gemini-3-flash-preview: 107 107 Time Traveled google/gemini-3-flash-preview: 182 182 google/gemini-3-flash-preview google/gemma-4-26b-a4b google/gemma-4-e4b x-ai/grok-4.3 openai/gpt-5.4-nano qwen/qwen3.6-35b-a3b google/gemini-3.1-flash-lite mistralai/mistral-medium-3-5

Estimated cost per checkpoint

$0.00 $0.60 $1 $2 $2 $3 Cost ($) Left Bedroom google/gemini-3-flash-preview: $0.17 x-ai/grok-4.3: $0.33 $0.33 google/gemini-3.1-flash-lite: $0.11 mistralai/mistral-medium-3-5: $0.38 $0.38 Exited House google/gemini-3-flash-preview: $0.55 $0.55 x-ai/grok-4.3: $0.33 $0.33 google/gemini-3.1-flash-lite: $0.20 $0.20 mistralai/mistral-medium-3-5: $0.38 $0.38 Reached the Fair google/gemini-3-flash-preview: $0.61 $0.61 x-ai/grok-4.3: $2.13 $2.13 Met Marle google/gemini-3-flash-preview: $0.93 $0.93 Telepod Demo Announced google/gemini-3-flash-preview: $1.15 $1.15 Passed Candy Gate google/gemini-3-flash-preview: $1.27 $1.27 Reached Telepod google/gemini-3-flash-preview: $1.31 $1.31 Time Traveled google/gemini-3-flash-preview: $2.22 $2.22 google/gemini-3-flash-preview google/gemma-4-26b-a4b google/gemma-4-e4b x-ai/grok-4.3 openai/gpt-5.4-nano qwen/qwen3.6-35b-a3b google/gemini-3.1-flash-lite mistralai/mistral-medium-3-5

Exploration track

Secondary checkpoints reached by each model's best run. These reward engaging with the world beyond the critical path and do not affect primary rank.

Model allowance gato soda race_bet melchior cat_returned lunch_eaten pendant_sold bekkler_lab wait_battle_mode Total
google/gemini-3-flash-preview
25
1
2/10
google/gemma-4-26b-a4b
81
1/10
google/gemma-4-e4b 0/10
x-ai/grok-4.3
54
1/10
openai/gpt-5.4-nano 0/10
qwen/qwen3.6-35b-a3b 0/10
google/gemini-3.1-flash-lite
26
1/10
mistralai/mistral-medium-3-5
137
1/10
moonshotai/kimi-k2.6 0/10

Cells show the first cycle each secondary checkpoint was confirmed. Hover labels spell out each checkpoint.