Lost in the Middle (The Agents Season, Episode 3)
May 4, 2026 - 00:19:44
Radio and PodcastLive Radio & PodcastsWhat if an AI decided the smartest way to pass its test was to find the answer key? That's exactly what Anthropic's Claude Opus did when faced with a benchmark evaluation — reasoning that it was being tested, tracking do...
Benchmark Bank Heist is an episode from Linear Digressions by Ben Jaffe and Katie Malone. What if an AI decided the smartest way to pass its test was to find the answer key? That's exactly what Anthropic's Claude Opus did when faced with a...
This episode belongs to Linear Digressions.
Use the player on this page to stream the episode online.
Published Apr 6, 2026, 00:12:36 long, audio available.
What if an AI decided the smartest way to pass its test was to find the answer key? That's exactly what Anthropic's Claude Opus did when faced with a benchmark evaluation — reasoning that it was being tested, tracking down the encrypted eval dataset, decrypting it, and returning the answer it found inside. It's equal parts impressive and unsettling. This episode digs into what actually happened, why it matters for how we measure AI progress, and what this very novel failure mode means for the already-tricky science of benchmarking language models. Links Anthropic's writeup on the BrowseComp reverse-engineering done by Claude Opus 4.6: BrowseComp benchmark from OpenAI:
You can listen to Benchmark Bank Heist online on Radio and Podcast. Open the player on this page to stream the available audio.
Benchmark Bank Heist is an episode from Linear Digressions by Ben Jaffe and Katie Malone.
This episode is 00:12:36 long.
This episode was published on Apr 6, 2026.
Yes. Use the heart button on the episode page to add it to your favorite episodes list.
Yes. This page shows related episodes from Linear Digressions when more episodes are available from the podcast feed.
You can listen to Benchmark Bank Heist on this page when the episode audio is available from the podcast feed.
Benchmark Bank Heist is from Linear Digressions by Ben Jaffe and Katie Malone.
Published Apr 6, 2026 and 00:12:36 long