Chessbench
·2 mins
Overview # I wrote a simple harness to run OpenRouter LLMs in known-winning endgame positions.
There’s plenty of prior-art1,2 on LLMs playing chess, dating even back to a cool fine-tuned GPT-2 project3, but overall: they still can’t really do this. Surprising finding!
Results # Aggregated Result Table I burned on the order of $200 to obtain these runs.