Open Model Lab
Base vs SFT Behavior Change
Which behaviors improve or regress after supervised fine-tuning?
Status
- Status
- planned
- Month/theme
- August 2026: SFT Pipeline + Data Quality
Status: Planned. This page is a report scaffold. It does not contain model scores, charts, or completed run results.
Research question
Which behaviors improve or regress after supervised fine-tuning?
Planned setup
- Build the first SFT pipeline.
- Create and document instruction_v1.
- Evaluate base and SFT variants on the same eval suite.
Planned measurements
- Score where the grader supports a score.
- Latency and cost where the run infrastructure can measure them.
- Output-quality notes and failure-mode labels.
- Known caveats and reproducibility requirements.
Planned sections
- Research question and claim boundary
- Setup, model variants, data versions, and config hashes
- Eval suite or task design
- Measurements and failure modes
- Limitations, caveats, and next decision
Expected artifacts
- training/sft module.
- Base/SFT comparison table.
- Dataset card for instruction_v1.
Claim boundary
This report will evaluate behavior changes within a controlled task set, not general model quality.