Radio and PodcastRadio and PodcastLive Radio & Podcasts
The Race to Production-Grade Diffusion LLMs with Stefano Ermon - #764 artwork
Technology

The Race to Production-Grade Diffusion LLMs with Stefano Ermon - #764

This Week in Machine Learning & Artificial Intelligence (AI) Podcast by TWIML

Mar 26, 202663:18Technology

Today, we're joined by Stefano Ermon, associate professor at Stanford University and CEO of Inception Labs to discuss diffusion language models. We dig into how diffusion approaches—traditionally used for images—are bein...

About This Episode

The Race to Production-Grade Diffusion LLMs with Stefano Ermon - #764 is an episode from This Week in Machine Learning & Artificial Intelligence (AI) Podcast by TWIML. Today, we're joined by Stefano Ermon, associate professor at Stanford Un...

Listen Online

Use the player on this page to stream the episode online.

Episode Details

Published Mar 26, 2026, 63:18 long, audio available.

Questions About This Episode

What is The Race to Production-Grade Diffusion LLMs with Stefano Ermon - #764 about?

Today, we're joined by Stefano Ermon, associate professor at Stanford University and CEO of Inception Labs to discuss diffusion language models. We dig into how diffusion approaches—traditionally used for images—are being adapted for text and code generation, the technical challenges of applying continuous methods to discrete token spaces, and how diffusion models compare to traditional autoregressive LLMs. Stefano introduces Mercury 2, a commercial-scale diffusion LLM that can generate multiple tokens simultaneously and achieve inference speeds 5-10x faster than small frontier models, paving the way for latency-sensitive applications like voice interactions and fast agentic loops. We also cover the open research challenges in diffusion LLM training, serving infrastructure requirements, and post-training for diffusion-based systems. Finally, Stefano shares his perspective on whether diffusion models can rival or surpass autoregressive LLMs at scale, the advantages for highly controllable generation, and what the future of multimodal diffusion models might look like. The complete show notes for this episode can be found at

Where can I listen to The Race to Production-Grade Diffusion LLMs with Stefano Ermon - #764?

You can listen to The Race to Production-Grade Diffusion LLMs with Stefano Ermon - #764 online on Radio and Podcast. Open the player on this page to stream the available audio.

Which podcast is The Race to Production-Grade Diffusion LLMs with Stefano Ermon - #764 from?

The Race to Production-Grade Diffusion LLMs with Stefano Ermon - #764 is an episode from This Week in Machine Learning & Artificial Intelligence (AI) Podcast by TWIML.

How long is this episode?

This episode is 63:18 long.

When was this episode published?

This episode was published on Mar 26, 2026.

Can I save The Race to Production-Grade Diffusion LLMs with Stefano Ermon - #764 for later?

Yes. Use the heart button on the episode page to add it to your favorite episodes list.

Are there related episodes from This Week in Machine Learning & Artificial Intelligence (AI) Podcast?

Yes. This page shows related episodes from This Week in Machine Learning & Artificial Intelligence (AI) Podcast when more episodes are available from the podcast feed.

Quick Answers About This Episode

Where can I listen to The Race to Production-Grade Diffusion LLMs with Stefano Ermon - #764?

You can listen to The Race to Production-Grade Diffusion LLMs with Stefano Ermon - #764 on this page when the episode audio is available from the podcast feed.

Which podcast is this episode from?

The Race to Production-Grade Diffusion LLMs with Stefano Ermon - #764 is from This Week in Machine Learning & Artificial Intelligence (AI) Podcast by TWIML.

What are the episode details?

Published Mar 26, 2026 and 63:18 long