Open Model Lab
September 2026: Preference Optimization / DPO
Use preference data to improve SFT behavior in a more controlled way, then measure behavioral side effects.
Gate status
- Month
- 2026-09
- Status
- planned
- Report
- DPO Behavior Impact
Success criterion
The report clearly shows where post-training helps and where it creates risk.
Focus
- Design chosen/rejected response pairs.
- Build a DPO training pipeline.
- Compare Base, SFT, and SFT+DPO on the same eval suite.
- Measure helpfulness, factuality, conciseness, coding, over-refusal, and style collapse.
- Check for factuality drift and over-optimization.
Expected outputs
- training/dpo module.
- Base/SFT/DPO comparison.
- Preference dataset card.
- Report: how preference optimization changes model behavior.
End-of-month decision
Is DPO improving real behavior or only optimizing preference style?