Version 2 · Archive

ChronoBench

Measuring how far different language models can progress in Chrono Trigger using an autonomous vision-based game agent with procedural skills, curated notes, and event-driven strategic reasoning.

Last updated: 2026-04-19 · 200-cycle budget · v2 harness · 6 runs frozen.

These runs used the v2 harness. For current runs on the evidence-validating v3 harness, see Version 3.

Leaderboard

Model Provider Last Checkpoint 123456 Cycles Stuck Skills Superego Tokens In ms/cycle Est. Cost Date
google/gemini-3-flash-preview OpenRouter marle_met 24 44 48 65 200* 59 7 24 4,200,240 13,907 $2.29 2026-04-18
xiaomi/mimo-v2-omni OpenRouter fair_entered 15 42 78 193 31 7 35 1,786,899 60,361 2026-04-19
qwen/qwen3.6-35b-a3b Local fair_entered 23 33 65 200* 8 34 1,992,175 33,893 Free 2026-04-18
z-ai/glm-5v-turbo OpenRouter fair_entered 26 36 98 200* 16 28 1,932,259 47,551 $3.08 2026-04-18
google/gemma-4-26b-a4b OpenRouter house_exit 17 64 200* 57 26 2,159,500 25,968 2026-04-19
google/gemma-4-e4b OpenRouter house_exit 16 63 200* 151 2 42 2,338,563 22,846 2026-04-18

* = cycle budget exhausted

Cycles per checkpoint

0 22 43 65 86 108 Cycles Left Bedroom google/gemini-3-flash-preview: 24 24 xiaomi/mimo-v2-omni: 15 15 qwen/qwen3.6-35b-a3b: 23 23 z-ai/glm-5v-turbo: 26 26 google/gemma-4-26b-a4b: 17 17 google/gemma-4-e4b: 16 16 Exited House google/gemini-3-flash-preview: 44 44 xiaomi/mimo-v2-omni: 42 42 qwen/qwen3.6-35b-a3b: 33 33 z-ai/glm-5v-turbo: 36 36 google/gemma-4-26b-a4b: 64 64 google/gemma-4-e4b: 63 63 Reached the Fair google/gemini-3-flash-preview: 48 48 xiaomi/mimo-v2-omni: 78 78 qwen/qwen3.6-35b-a3b: 65 65 z-ai/glm-5v-turbo: 98 98 Met Marle google/gemini-3-flash-preview: 65 65 Reached Telepod Time Traveled google/gemini-3-flash-preview xiaomi/mimo-v2-omni qwen/qwen3.6-35b-a3b z-ai/glm-5v-turbo google/gemma-4-26b-a4b google/gemma-4-e4b

Estimated cost per checkpoint

$0.00 $0.40 $0.80 $1 $2 $2 Cost ($) Left Bedroom google/gemini-3-flash-preview: $0.27 $0.27 z-ai/glm-5v-turbo: $0.40 $0.40 Exited House google/gemini-3-flash-preview: $0.50 $0.50 z-ai/glm-5v-turbo: $0.56 $0.56 Reached the Fair google/gemini-3-flash-preview: $0.55 $0.55 z-ai/glm-5v-turbo: $1.51 $1.51 Met Marle google/gemini-3-flash-preview: $0.74 $0.74 Reached Telepod Time Traveled google/gemini-3-flash-preview xiaomi/mimo-v2-omni qwen/qwen3.6-35b-a3b z-ai/glm-5v-turbo google/gemma-4-26b-a4b google/gemma-4-e4b