
How to Find the Agent Failures Your Evals Miss with Scott Clark - #767
May 7, 2026 - 53:19
Radio and PodcastLive Radio & Podcasts
In this episode, Philip Kiely, head of AI education at Baseten, joins us to unpack the fast-evolving discipline of inference engineering. We explore why inference has become the stickiest and most critical workload in AI...
How to Engineer AI Inference Systems with Philip Kiely - #766 is an episode from This Week in Machine Learning & Artificial Intelligence (AI) Podcast by TWIML. In this episode, Philip Kiely, head of AI education at Baseten, joins us to unpa...
This episode belongs to This Week in Machine Learning & Artificial Intelligence (AI) Podcast.
Use the player on this page to stream the episode online.
Published Apr 30, 2026, 54:51 long, audio available.
In this episode, Philip Kiely, head of AI education at Baseten, joins us to unpack the fast-evolving discipline of inference engineering. We explore why inference has become the stickiest and most critical workload in AI, how it blends GPU programming, applied research, and large-scale distributed systems, and where the line sits between inference and model serving. Philip shares how research-to-production can move in hours, not months, and why understanding “the knobs” of inference—batching, quantization, speculation, and KV cache reuse—lets teams design better products and SLAs. We trace the inference maturity journey from closed APIs to dedicated deployments and in-house platforms, discuss GPU lifecycles, and survey today’s runtime landscape, including vLLM, SGLang, and TensorRT LLM. Finally, we look ahead to agents and multimodality, making the case for specialized, workload-specific runtimes when performance and efficiency matter most. The complete show notes for this episode can be found at
You can listen to How to Engineer AI Inference Systems with Philip Kiely - #766 online on Radio and Podcast. Open the player on this page to stream the available audio.
How to Engineer AI Inference Systems with Philip Kiely - #766 is an episode from This Week in Machine Learning & Artificial Intelligence (AI) Podcast by TWIML.
This episode is 54:51 long.
This episode was published on Apr 30, 2026.
Yes. Use the heart button on the episode page to add it to your favorite episodes list.
Yes. This page shows related episodes from This Week in Machine Learning & Artificial Intelligence (AI) Podcast when more episodes are available from the podcast feed.
You can listen to How to Engineer AI Inference Systems with Philip Kiely - #766 on this page when the episode audio is available from the podcast feed.
How to Engineer AI Inference Systems with Philip Kiely - #766 is from This Week in Machine Learning & Artificial Intelligence (AI) Podcast by TWIML.
Published Apr 30, 2026 and 54:51 long