Open Model Lab
DPO Behavior Impact
How does preference optimization change helpfulness, instruction following, factuality, conciseness, coding, and refusal behavior?
Status
- Status
- planned
- Month/theme
- September 2026: Preference Optimization / DPO
Status: Planned. This page is a report scaffold. It does not contain model scores, charts, or completed run results.
Research question
How does preference optimization change helpfulness, instruction following, factuality, conciseness, coding, and refusal behavior?
Planned setup
- Design chosen/rejected preference pairs.
- Train a DPO variant from the SFT checkpoint.
- Compare Base, SFT, and SFT+DPO on the same eval suite.
Planned measurements
- Score where the grader supports a score.
- Latency and cost where the run infrastructure can measure them.
- Output-quality notes and failure-mode labels.
- Known caveats and reproducibility requirements.
- Over-refusal, style collapse, and factuality drift checks.
Planned sections
- Research question and claim boundary
- Setup, model variants, data versions, and config hashes
- Eval suite or task design
- Measurements and failure modes
- Limitations, caveats, and next decision
Expected artifacts
- training/dpo module.
- Base/SFT/DPO comparison.
- Preference dataset card.
Claim boundary
This report will not treat preference loss as model quality.