Version 1 · Archive

ChronoBench

Measuring how far different language models can progress in Chrono Trigger using an autonomous vision-based game agent.

Last updated: 2026-04-18 · Frozen at 17 runs.

These runs were produced by the v1 harness. For current runs on the v3 architecture, see Version 3, or browse the intermediate v2 archive.

Leaderboard

Model Provider Last Checkpoint 123456 Cycles Stuck Tokens In ms/cycle Est. Cost Date
google/gemma-4-26b-a4b Local marle_met 18 24 48 180 200* 49 2,018,372 24,398 Free 2026-04-11
google/gemini-3-flash-preview OpenRouter marle_met 24 30 72 90 200* 52 3,726,277 13,687 $2.05 2026-04-16
google/gemini-3-flash-preview OpenRouter marle_met 14 21 72 137 200* 53 4,465,031 12,932 $2.42 2026-04-17
google/gemini-3-flash-preview OpenRouter marle_met 15 18 33 69 200* 68 4,446,619 14,076 $2.41 2026-04-17
google/gemini-3-flash-preview OpenRouter marle_met 14 42 51 74 200* 33 4,576,344 14,267 $2.47 2026-04-17
google/gemini-3-flash-preview Local marle_met 14 63 92 99 200* 68 3,158,124 13,429 Free 2026-04-18
google/gemma-4-26b-a4b Local fair_entered 18 48 66 100* 31 1,055,470 25,174 Free 2026-04-09
google/gemma-4-26b-a4b Local fair_entered 18 24 36 100* 42 1,062,862 25,482 Free 2026-04-09
google/gemma-4-26b-a4b Local fair_entered 24 48 60 200* 50 2,077,876 24,953 Free 2026-04-11
google/gemma-4-26b-a4b Local fair_entered 24 30 84 200* 49 2,391,209 23,392 Free 2026-04-12

* = cycle budget exhausted

Cycles per checkpoint

0 40 80 119 159 199 Cycles Left Bedroom google/gemma-4-26b-a4b: 18 18 google/gemini-3-flash-preview: 24 24 google/gemma-4-31b: 24 24 Exited House google/gemma-4-26b-a4b: 24 24 google/gemini-3-flash-preview: 30 30 google/gemma-4-31b: 48 48 Reached the Fair google/gemma-4-26b-a4b: 48 48 google/gemini-3-flash-preview: 72 72 google/gemma-4-31b: 66 66 Met Marle google/gemma-4-26b-a4b: 180 180 google/gemini-3-flash-preview: 90 90 Reached Telepod Time Traveled google/gemma-4-26b-a4b google/gemini-3-flash-preview google/gemma-4-31b

Estimated cost per checkpoint

$0.00 $0.40 $0.80 $1 $2 $2 Cost ($) Left Bedroom google/gemini-3-flash-preview: $0.25 $0.25 Exited House google/gemini-3-flash-preview: $0.31 $0.31 Reached the Fair google/gemini-3-flash-preview: $0.74 $0.74 Met Marle google/gemini-3-flash-preview: $0.92 $0.92 Reached Telepod Time Traveled google/gemma-4-26b-a4b google/gemini-3-flash-preview google/gemma-4-31b

All runs

Model Last checkpoint Cycles Stuck Date
google/gemma-4-26b-a4b house_exit 100* 43 2026-04-09
google/gemma-4-26b-a4b fair_entered 100* 31 2026-04-09
google/gemma-4-26b-a4b fair_entered 100* 42 2026-04-09
claude-haiku-4-5-20251001 100* 12 2026-04-09
google/gemma-4-26b-a4b fair_entered 200* 50 2026-04-11
google/gemma-4-26b-a4b marle_met 200* 49 2026-04-11
google/gemma-4-26b-a4b fair_entered 200* 49 2026-04-12
google/gemma-4-26b-a4b fair_entered 200* 54 2026-04-12
google/gemma-4-31b fair_entered 200* 60 2026-04-12
google/gemini-3-flash-preview 1 0 2026-04-16
google/gemini-3-flash-preview marle_met 200* 52 2026-04-16
google/gemini-3-flash-preview marle_met 200* 53 2026-04-17
google/gemini-3-flash-preview marle_met 200* 68 2026-04-17
google/gemini-3-flash-preview marle_met 200* 33 2026-04-17
google/gemini-3-flash-preview 91 37 2026-04-17
google/gemini-3-flash-preview marle_met 200* 68 2026-04-18
google/gemma-4-26b-a4b fair_entered 200* 31 2026-04-18