
Reinforcement Fine-Tuning and the Future of Specialized AI Models
Aug 5, 2025 - 40:24
Radio and PodcastLive Radio & Podcasts
In this episode, Brandon Cui, Research Scientist at MosaicML and Databricks, dives into cutting-edge advancements in AI model optimization, focusing on Reward Models and Reinforcement Learning from Human Feedback (RLHF)....
Reward Models Data Brew Episode 40 is an episode from Data Brew by Databricks by Databricks. In this episode, Brandon Cui, Research Scientist at MosaicML and Databricks, dives into cutting-edge advancements in AI model optimization, focusin...
This episode belongs to Data Brew by Databricks.
Use the player on this page to stream the episode online.
Published Mar 20, 2025, 39:58 long, audio available.
In this episode, Brandon Cui, Research Scientist at MosaicML and Databricks, dives into cutting-edge advancements in AI model optimization, focusing on Reward Models and Reinforcement Learning from Human Feedback (RLHF). Highlights include: - How synthetic data and RLHF enable fine-tuning models to generate preferred outcomes. - Techniques like Policy Proximal Optimization (PPO) and Direct Preference Optimization (DPO) for enhancing response quality. - The role of reward models in improving coding, math, reasoning, and other NLP tasks. Connect with Brandon Cui:
You can listen to Reward Models Data Brew Episode 40 online on Radio and Podcast. Open the player on this page to stream the available audio.
Reward Models Data Brew Episode 40 is an episode from Data Brew by Databricks by Databricks.
This episode is 39:58 long.
This episode was published on Mar 20, 2025.
Yes. Use the heart button on the episode page to add it to your favorite episodes list.
Yes. This page shows related episodes from Data Brew by Databricks when more episodes are available from the podcast feed.
You can listen to Reward Models Data Brew Episode 40 on this page when the episode audio is available from the podcast feed.
Reward Models Data Brew Episode 40 is from Data Brew by Databricks by Databricks.
Published Mar 20, 2025 and 39:58 long