Open Model Lab
August 2026: SFT Pipeline + Data Quality
Measure the behavioral difference between a base model and an instruction-tuned model.
Gate status
- Month
- 2026-08
- Status
- planned
- Report
- Base vs SFT Behavior Change
Success criterion
The work does not stop at fine-tuning; it shows measurable behavior changes, including regressions.
Focus
- Build an SFT pipeline.
- Define and clean instruction data.
- Compare base vs SFT behavior using the same eval harness.
- Measure eval regressions: what improved, what got worse?
- Create dataset and model cards.
Expected outputs
- training/sft module.
- Base/SFT behavior comparison table.
- Dataset card for instruction_v1.
- Report: what behavior does SFT improve and what can it damage?
End-of-month decision
Is the SFT checkpoint reliable enough to serve as the reference for preference optimization?