Open Model Lab

Score per GPU-Hour

Which data mixtures produce the most behavior gain per GPU-hour?

Status

Status
planned
Month/theme
May 2027: Data Efficiency + Scaling Ladder
Status: Planned. This page is a report scaffold. It does not contain model scores, charts, or completed run results.

Research question

Which data mixtures produce the most behavior gain per GPU-hour?

Planned setup

  • Build small model ladder experiments.
  • Compare data mixtures and quality filters under constrained compute.
  • Check contamination control before interpreting behavior gains.

Planned measurements

  • Score/GPU-hour.
  • Data mixture effect.
  • Quality filtering effect.
  • Contamination and leakage caveats.

Planned sections

  • Research question and claim boundary
  • Setup, model variants, data versions, and config hashes
  • Eval suite or task design
  • Measurements and failure modes
  • Limitations, caveats, and next decision

Expected artifacts

  • data_scaling module.
  • Data mixture comparison report.
  • Score/GPU-hour table.

Claim boundary

This report compares small scaling-ladder experiments only.