Radio and PodcastRadio and PodcastLive Radio & Podcasts
Benchmarking AI Models artwork
Technology

Benchmarking AI Models

Linear Digressions by Ben Jaffe and Katie Malone

Mar 30, 202600:29:55Technology

How do you know if a new AI model is actually better than the last one? It turns out answering that question is a lot messier than it sounds. This week we dig into the world of LLM benchmarks — the standardized tests use...

About This Episode

Benchmarking AI Models is an episode from Linear Digressions by Ben Jaffe and Katie Malone. How do you know if a new AI model is actually better than the last one? It turns out answering that question is a lot messier than it sounds. This w...

Podcast

This episode belongs to Linear Digressions.

Listen Online

Use the player on this page to stream the episode online.

Episode Details

Published Mar 30, 2026, 00:29:55 long, audio available.

Questions About This Episode

What is Benchmarking AI Models about?

How do you know if a new AI model is actually better than the last one? It turns out answering that question is a lot messier than it sounds. This week we dig into the world of LLM benchmarks — the standardized tests used to compare models — exploring two canonical examples: MMLU, a 14,000-question multiple choice gauntlet spanning medicine, law, and philosophy, and SWE-bench, which throws real GitHub bugs at models to see if they can fix them. Along the way: Goodhart's Law, data contamination, canary strings, and why acing a test isn't always the same as being smart.

Where can I listen to Benchmarking AI Models?

You can listen to Benchmarking AI Models online on Radio and Podcast. Open the player on this page to stream the available audio.

Which podcast is Benchmarking AI Models from?

Benchmarking AI Models is an episode from Linear Digressions by Ben Jaffe and Katie Malone.

How long is this episode?

This episode is 00:29:55 long.

When was this episode published?

This episode was published on Mar 30, 2026.

Can I save Benchmarking AI Models for later?

Yes. Use the heart button on the episode page to add it to your favorite episodes list.

Are there related episodes from Linear Digressions?

Yes. This page shows related episodes from Linear Digressions when more episodes are available from the podcast feed.

Quick Answers About This Episode

Where can I listen to Benchmarking AI Models?

You can listen to Benchmarking AI Models on this page when the episode audio is available from the podcast feed.

Which podcast is this episode from?

Benchmarking AI Models is from Linear Digressions by Ben Jaffe and Katie Malone.

What are the episode details?

Published Mar 30, 2026 and 00:29:55 long