<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Benchmarks on danielhua.com</title><link>https://danielhua.com/tags/benchmarks/</link><description>Recent content in Benchmarks on danielhua.com</description><generator>Hugo -- gohugo.io</generator><language>en</language><copyright>© 2026 Daniel Hua</copyright><lastBuildDate>Mon, 11 May 2026 00:00:00 +0000</lastBuildDate><atom:link href="https://danielhua.com/tags/benchmarks/index.xml" rel="self" type="application/rss+xml"/><item><title>Chessbench</title><link>https://danielhua.com/posts/2026-05-11-chessbench-findings/</link><pubDate>Mon, 11 May 2026 00:00:00 +0000</pubDate><guid>https://danielhua.com/posts/2026-05-11-chessbench-findings/</guid><description>&lt;h2 class="relative group">Overview
 &lt;div id="overview" class="anchor">&lt;/div>
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none">
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#overview" aria-label="Anchor">#&lt;/a>
 &lt;/span>
 
&lt;/h2>
&lt;p>I wrote a simple &lt;a href="https://github.com/danielhuadotcom/chessbench" target="_blank" rel="noreferrer">harness&lt;/a> to run OpenRouter LLMs in known-winning endgame positions.&lt;/p>
&lt;p>There&amp;rsquo;s plenty of prior-art&lt;sup id="fnref:1">&lt;a href="#fn:1" class="footnote-ref" role="doc-noteref">1&lt;/a>&lt;/sup>&lt;sup>,&lt;/sup>&lt;sup id="fnref:2">&lt;a href="#fn:2" class="footnote-ref" role="doc-noteref">2&lt;/a>&lt;/sup> on LLMs playing chess, dating even back to a cool fine-tuned GPT-2 project&lt;sup id="fnref:3">&lt;a href="#fn:3" class="footnote-ref" role="doc-noteref">3&lt;/a>&lt;/sup>,
but overall: they &lt;em>still&lt;/em> can&amp;rsquo;t really do this. Surprising finding!&lt;/p>

&lt;h2 class="relative group">Results
 &lt;div id="results" class="anchor">&lt;/div>
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none">
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#results" aria-label="Anchor">#&lt;/a>
 &lt;/span>
 
&lt;/h2>
&lt;ul>
&lt;li>&lt;a href="https://danielhua.com/chessbench/run_logs/report_results.html" >Aggregated Result Table&lt;/a>&lt;/li>
&lt;/ul>
&lt;p>I burned on the order of $200 to obtain these runs.&lt;/p></description></item></channel></rss>