Open Model Lab

DPO Behavior Impact

How does preference optimization change helpfulness, instruction following, factuality, conciseness, coding, and refusal behavior?

Status

Status: planned
Month/theme: September 2026: Preference Optimization / DPO

Status: Planned. This page is a report scaffold. It does not contain model scores, charts, or completed run results.

Research question

How does preference optimization change helpfulness, instruction following, factuality, conciseness, coding, and refusal behavior?

Planned setup

Design chosen/rejected preference pairs.
Train a DPO variant from the SFT checkpoint.
Compare Base, SFT, and SFT+DPO on the same eval suite.

Planned measurements

Score where the grader supports a score.
Latency and cost where the run infrastructure can measure them.
Output-quality notes and failure-mode labels.
Known caveats and reproducibility requirements.
Over-refusal, style collapse, and factuality drift checks.

Planned sections

Research question and claim boundary
Setup, model variants, data versions, and config hashes
Eval suite or task design
Measurements and failure modes
Limitations, caveats, and next decision

Expected artifacts

training/dpo module.
Base/SFT/DPO comparison.
Preference dataset card.

Claim boundary

This report will not treat preference loss as model quality.

Related links

Reports index Related month page Runs