Open Model Lab

August 2026: SFT Pipeline + Data Quality

Measure the behavioral difference between a base model and an instruction-tuned model.

Gate status

Month: 2026-08
Status: planned
Report: Base vs SFT Behavior Change

Success criterion

The work does not stop at fine-tuning; it shows measurable behavior changes, including regressions.

Focus

Build an SFT pipeline.
Define and clean instruction data.
Compare base vs SFT behavior using the same eval harness.
Measure eval regressions: what improved, what got worse?
Create dataset and model cards.

Expected outputs

training/sft module.
Base/SFT behavior comparison table.
Dataset card for instruction_v1.
Report: what behavior does SFT improve and what can it damage?

End-of-month decision

Is the SFT checkpoint reliable enough to serve as the reference for preference optimization?

Related links

All months Timeline Planned report