Open Model Lab

Base vs SFT Behavior Change

Which behaviors improve or regress after supervised fine-tuning?

Status

Status
planned
Month/theme
August 2026: SFT Pipeline + Data Quality
Status: Planned. This page is a report scaffold. It does not contain model scores, charts, or completed run results.

Research question

Which behaviors improve or regress after supervised fine-tuning?

Planned setup

  • Build the first SFT pipeline.
  • Create and document instruction_v1.
  • Evaluate base and SFT variants on the same eval suite.

Planned measurements

  • Score where the grader supports a score.
  • Latency and cost where the run infrastructure can measure them.
  • Output-quality notes and failure-mode labels.
  • Known caveats and reproducibility requirements.

Planned sections

  • Research question and claim boundary
  • Setup, model variants, data versions, and config hashes
  • Eval suite or task design
  • Measurements and failure modes
  • Limitations, caveats, and next decision

Expected artifacts

  • training/sft module.
  • Base/SFT comparison table.
  • Dataset card for instruction_v1.

Claim boundary

This report will evaluate behavior changes within a controlled task set, not general model quality.