Open Model Lab

September 2026: Preference Optimization / DPO

Use preference data to improve SFT behavior in a more controlled way, then measure behavioral side effects.

Gate status

Month: 2026-09
Status: planned
Report: DPO Behavior Impact

Success criterion

The report clearly shows where post-training helps and where it creates risk.

Focus

Design chosen/rejected response pairs.
Build a DPO training pipeline.
Compare Base, SFT, and SFT+DPO on the same eval suite.
Measure helpfulness, factuality, conciseness, coding, over-refusal, and style collapse.
Check for factuality drift and over-optimization.

Expected outputs

training/dpo module.
Base/SFT/DPO comparison.
Preference dataset card.
Report: how preference optimization changes model behavior.

End-of-month decision

Is DPO improving real behavior or only optimizing preference style?

Related links

All months Timeline Planned report