Radio and PodcastRadio and PodcastLive Radio & Podcasts
Scaling Agentic Inference Across Heterogeneous Compute with Zain Asgar - #757 artwork
Technology

Scaling Agentic Inference Across Heterogeneous Compute with Zain Asgar - #757

This Week in Machine Learning & Artificial Intelligence (AI) Podcast by TWIML

Dec 2, 202548:44Technology

In this episode, Zain Asgar, co-founder and CEO of Gimlet Labs, joins us to discuss the heterogeneous AI inference across diverse hardware. Zain argues that the current industry standard of running all AI workloads on hi...

About This Episode

Scaling Agentic Inference Across Heterogeneous Compute with Zain Asgar - #757 is an episode from This Week in Machine Learning & Artificial Intelligence (AI) Podcast by TWIML. In this episode, Zain Asgar, co-founder and CEO of Gimlet Labs,...

Listen Online

Use the player on this page to stream the episode online.

Episode Details

Published Dec 2, 2025, 48:44 long, audio available.

Questions About This Episode

What is Scaling Agentic Inference Across Heterogeneous Compute with Zain Asgar - #757 about?

In this episode, Zain Asgar, co-founder and CEO of Gimlet Labs, joins us to discuss the heterogeneous AI inference across diverse hardware. Zain argues that the current industry standard of running all AI workloads on high-end GPUs is unsustainable for agents, which consume significantly more tokens than traditional LLM applications. We explore Gimlet’s approach to heterogeneous inference, which involves disaggregating workloads across a mix of hardware—from H100s to older GPUs and CPUs—to optimize unit economics without sacrificing performance. We dive into their "three-layer cake" architecture: workload disaggregation, a compilation layer that maps models to specific hardware targets, and a novel system that uses LLMs to autonomously rewrite and optimize compute kernels. Finally, we discuss the complexities of networking in heterogeneous environments, the trade-offs between numerical precision and application accuracy, and the future of hardware-aware scheduling. The complete show notes for this episode can be found at

Where can I listen to Scaling Agentic Inference Across Heterogeneous Compute with Zain Asgar - #757?

You can listen to Scaling Agentic Inference Across Heterogeneous Compute with Zain Asgar - #757 online on Radio and Podcast. Open the player on this page to stream the available audio.

Which podcast is Scaling Agentic Inference Across Heterogeneous Compute with Zain Asgar - #757 from?

Scaling Agentic Inference Across Heterogeneous Compute with Zain Asgar - #757 is an episode from This Week in Machine Learning & Artificial Intelligence (AI) Podcast by TWIML.

How long is this episode?

This episode is 48:44 long.

When was this episode published?

This episode was published on Dec 2, 2025.

Can I save Scaling Agentic Inference Across Heterogeneous Compute with Zain Asgar - #757 for later?

Yes. Use the heart button on the episode page to add it to your favorite episodes list.

Are there related episodes from This Week in Machine Learning & Artificial Intelligence (AI) Podcast?

Yes. This page shows related episodes from This Week in Machine Learning & Artificial Intelligence (AI) Podcast when more episodes are available from the podcast feed.

Quick Answers About This Episode

Where can I listen to Scaling Agentic Inference Across Heterogeneous Compute with Zain Asgar - #757?

You can listen to Scaling Agentic Inference Across Heterogeneous Compute with Zain Asgar - #757 on this page when the episode audio is available from the podcast feed.

Which podcast is this episode from?

Scaling Agentic Inference Across Heterogeneous Compute with Zain Asgar - #757 is from This Week in Machine Learning & Artificial Intelligence (AI) Podcast by TWIML.

What are the episode details?

Published Dec 2, 2025 and 48:44 long