
How to Find the Agent Failures Your Evals Miss with Scott Clark - #767
May 7, 2026 - 53:19
Radio and PodcastLive Radio & Podcasts
Today, we're joined by Stefano Ermon, associate professor at Stanford University and CEO of Inception Labs to discuss diffusion language models. We dig into how diffusion approaches—traditionally used for images—are bein...
The Race to Production-Grade Diffusion LLMs with Stefano Ermon - #764 is an episode from This Week in Machine Learning & Artificial Intelligence (AI) Podcast by TWIML. Today, we're joined by Stefano Ermon, associate professor at Stanford Un...
This episode belongs to This Week in Machine Learning & Artificial Intelligence (AI) Podcast.
Use the player on this page to stream the episode online.
Published Mar 26, 2026, 63:18 long, audio available.
Today, we're joined by Stefano Ermon, associate professor at Stanford University and CEO of Inception Labs to discuss diffusion language models. We dig into how diffusion approaches—traditionally used for images—are being adapted for text and code generation, the technical challenges of applying continuous methods to discrete token spaces, and how diffusion models compare to traditional autoregressive LLMs. Stefano introduces Mercury 2, a commercial-scale diffusion LLM that can generate multiple tokens simultaneously and achieve inference speeds 5-10x faster than small frontier models, paving the way for latency-sensitive applications like voice interactions and fast agentic loops. We also cover the open research challenges in diffusion LLM training, serving infrastructure requirements, and post-training for diffusion-based systems. Finally, Stefano shares his perspective on whether diffusion models can rival or surpass autoregressive LLMs at scale, the advantages for highly controllable generation, and what the future of multimodal diffusion models might look like. The complete show notes for this episode can be found at
You can listen to The Race to Production-Grade Diffusion LLMs with Stefano Ermon - #764 online on Radio and Podcast. Open the player on this page to stream the available audio.
The Race to Production-Grade Diffusion LLMs with Stefano Ermon - #764 is an episode from This Week in Machine Learning & Artificial Intelligence (AI) Podcast by TWIML.
This episode is 63:18 long.
This episode was published on Mar 26, 2026.
Yes. Use the heart button on the episode page to add it to your favorite episodes list.
Yes. This page shows related episodes from This Week in Machine Learning & Artificial Intelligence (AI) Podcast when more episodes are available from the podcast feed.
You can listen to The Race to Production-Grade Diffusion LLMs with Stefano Ermon - #764 on this page when the episode audio is available from the podcast feed.
The Race to Production-Grade Diffusion LLMs with Stefano Ermon - #764 is from This Week in Machine Learning & Artificial Intelligence (AI) Podcast by TWIML.
Published Mar 26, 2026 and 63:18 long