Radio and PodcastRadio and PodcastLive Radio & Podcasts
How to Engineer AI Inference Systems with Philip Kiely - #766 artwork
Technology

How to Engineer AI Inference Systems with Philip Kiely - #766

This Week in Machine Learning & Artificial Intelligence (AI) Podcast by TWIML

Apr 30, 202654:51Technology

In this episode, Philip Kiely, head of AI education at Baseten, joins us to unpack the fast-evolving discipline of inference engineering. We explore why inference has become the stickiest and most critical workload in AI...

About This Episode

How to Engineer AI Inference Systems with Philip Kiely - #766 is an episode from This Week in Machine Learning & Artificial Intelligence (AI) Podcast by TWIML. In this episode, Philip Kiely, head of AI education at Baseten, joins us to unpa...

Listen Online

Use the player on this page to stream the episode online.

Episode Details

Published Apr 30, 2026, 54:51 long, audio available.

Questions About This Episode

What is How to Engineer AI Inference Systems with Philip Kiely - #766 about?

In this episode, Philip Kiely, head of AI education at Baseten, joins us to unpack the fast-evolving discipline of inference engineering. We explore why inference has become the stickiest and most critical workload in AI, how it blends GPU programming, applied research, and large-scale distributed systems, and where the line sits between inference and model serving. Philip shares how research-to-production can move in hours, not months, and why understanding “the knobs” of inference—batching, quantization, speculation, and KV cache reuse—lets teams design better products and SLAs. We trace the inference maturity journey from closed APIs to dedicated deployments and in-house platforms, discuss GPU lifecycles, and survey today’s runtime landscape, including vLLM, SGLang, and TensorRT LLM. Finally, we look ahead to agents and multimodality, making the case for specialized, workload-specific runtimes when performance and efficiency matter most. The complete show notes for this episode can be found at

Where can I listen to How to Engineer AI Inference Systems with Philip Kiely - #766?

You can listen to How to Engineer AI Inference Systems with Philip Kiely - #766 online on Radio and Podcast. Open the player on this page to stream the available audio.

Which podcast is How to Engineer AI Inference Systems with Philip Kiely - #766 from?

How to Engineer AI Inference Systems with Philip Kiely - #766 is an episode from This Week in Machine Learning & Artificial Intelligence (AI) Podcast by TWIML.

How long is this episode?

This episode is 54:51 long.

When was this episode published?

This episode was published on Apr 30, 2026.

Can I save How to Engineer AI Inference Systems with Philip Kiely - #766 for later?

Yes. Use the heart button on the episode page to add it to your favorite episodes list.

Are there related episodes from This Week in Machine Learning & Artificial Intelligence (AI) Podcast?

Yes. This page shows related episodes from This Week in Machine Learning & Artificial Intelligence (AI) Podcast when more episodes are available from the podcast feed.

Quick Answers About This Episode

Where can I listen to How to Engineer AI Inference Systems with Philip Kiely - #766?

You can listen to How to Engineer AI Inference Systems with Philip Kiely - #766 on this page when the episode audio is available from the podcast feed.

Which podcast is this episode from?

How to Engineer AI Inference Systems with Philip Kiely - #766 is from This Week in Machine Learning & Artificial Intelligence (AI) Podcast by TWIML.

What are the episode details?

Published Apr 30, 2026 and 54:51 long