Open Model Lab

Open Model Lab Months

Detailed month pages for the planned Open Model Research Harness gates.

Month index

planned

Build the basic research infrastructure that can measure model behavior reliably before changing it.

planned

Measure the behavioral difference between a base model and an instruction-tuned model.

planned

Use preference data to improve SFT behavior in a more controlled way, then measure behavioral side effects.

planned

Move the model from passive question-answering into a tool-using agent setup.

planned

Measure agent success and failure taxonomy on multi-step tasks.

planned

Measure safety behavior by quality and balance, not just refusal rate.

planned

Evaluate not only final answers, but where reasoning processes break down.

planned

Start analyzing model behavior with internal signals in addition to external scores.

planned

Enter multimodal model work through screenshot and UI-understanding tasks.

planned

Add systems and profiling knowledge to make research experiments more efficient.

planned

Measure whether better data mixtures produce better behavior with the same compute.

planned

Turn the 12-month work into a showable, reproducible, publishable portfolio.