Open Model Lab

September 2026: Preference Optimization / DPO

Use preference data to improve SFT behavior in a more controlled way, then measure behavioral side effects.

Gate status

Month
2026-09
Status
planned
Report
DPO Behavior Impact

Success criterion

The report clearly shows where post-training helps and where it creates risk.

Focus

  • Design chosen/rejected response pairs.
  • Build a DPO training pipeline.
  • Compare Base, SFT, and SFT+DPO on the same eval suite.
  • Measure helpfulness, factuality, conciseness, coding, over-refusal, and style collapse.
  • Check for factuality drift and over-optimization.

Expected outputs

  • training/dpo module.
  • Base/SFT/DPO comparison.
  • Preference dataset card.
  • Report: how preference optimization changes model behavior.

End-of-month decision

Is DPO improving real behavior or only optimizing preference style?