Open Model Lab

Base vs SFT Behavior Change

Which behaviors improve or regress after supervised fine-tuning?

Status

Status: planned
Month/theme: August 2026: SFT Pipeline + Data Quality

Status: Planned. This page is a report scaffold. It does not contain model scores, charts, or completed run results.

Research question

Which behaviors improve or regress after supervised fine-tuning?

Planned setup

Build the first SFT pipeline.
Create and document instruction_v1.
Evaluate base and SFT variants on the same eval suite.

Planned measurements

Score where the grader supports a score.
Latency and cost where the run infrastructure can measure them.
Output-quality notes and failure-mode labels.
Known caveats and reproducibility requirements.

Planned sections

Research question and claim boundary
Setup, model variants, data versions, and config hashes
Eval suite or task design
Measurements and failure modes
Limitations, caveats, and next decision

Expected artifacts

training/sft module.
Base/SFT comparison table.
Dataset card for instruction_v1.

Claim boundary

This report will evaluate behavior changes within a controlled task set, not general model quality.

Related links

Reports index Related month page Runs