
How to Engineer AI Inference Systems with Philip Kiely - #766
Apr 30, 2026 - 54:51
Radio and PodcastLive Radio & Podcasts
In this episode, Zain Asgar, co-founder and CEO of Gimlet Labs, joins us to discuss the heterogeneous AI inference across diverse hardware. Zain argues that the current industry standard of running all AI workloads on hi...
Scaling Agentic Inference Across Heterogeneous Compute with Zain Asgar - #757 is an episode from The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence) by TWIML. In this episode, Zain Asgar, co-founder and C...
This episode belongs to The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence).
Use the player on this page to stream the episode online.
Published Dec 2, 2025, 48:44 long, audio available.
In this episode, Zain Asgar, co-founder and CEO of Gimlet Labs, joins us to discuss the heterogeneous AI inference across diverse hardware. Zain argues that the current industry standard of running all AI workloads on high-end GPUs is unsustainable for agents, which consume significantly more tokens than traditional LLM applications. We explore Gimlet’s approach to heterogeneous inference, which involves disaggregating workloads across a mix of hardware—from H100s to older GPUs and CPUs—to optimize unit economics without sacrificing performance. We dive into their "three-layer cake" architecture: workload disaggregation, a compilation layer that maps models to specific hardware targets, and a novel system that uses LLMs to autonomously rewrite and optimize compute kernels. Finally, we discuss the complexities of networking in heterogeneous environments, the trade-offs between numerical precision and application accuracy, and the future of hardware-aware scheduling. The complete show notes for this episode can be found at
You can listen to Scaling Agentic Inference Across Heterogeneous Compute with Zain Asgar - #757 online on Radio and Podcast. Open the player on this page to stream the available audio.
Scaling Agentic Inference Across Heterogeneous Compute with Zain Asgar - #757 is an episode from The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence) by TWIML.
This episode is 48:44 long.
This episode was published on Dec 2, 2025.
Yes. Use the heart button on the episode page to add it to your favorite episodes list.
Yes. This page shows related episodes from The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence) when more episodes are available from the podcast feed.
You can listen to Scaling Agentic Inference Across Heterogeneous Compute with Zain Asgar - #757 on this page when the episode audio is available from the podcast feed.
Scaling Agentic Inference Across Heterogeneous Compute with Zain Asgar - #757 is from The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence) by TWIML.
Published Dec 2, 2025 and 48:44 long