Open Model Lab

Open Model Lab Timeline

A compact timeline of planned gates, report targets, and status from July 2026 through June 2027.

July 2026 - June 2027

July 2026

Foundation + Eval Harness

Different open models can be compared on the same tasks and a reproducible report can be generated.

Planned report: Minimal Open-Model Eval Harness

planned

August 2026

SFT Pipeline + Data Quality

The work does not stop at fine-tuning; it shows measurable behavior changes, including regressions.

Planned report: Base vs SFT Behavior Change

planned

September 2026

Preference Optimization / DPO

The report clearly shows where post-training helps and where it creates risk.

Planned report: DPO Behavior Impact

planned

October 2026

Agent Harness / Tool Use

Agent behavior is measured step by step with failure modes, not only success/failure.

Planned report: Post-Training and Agentic Coding Behavior

planned

November 2026

Agent Evals + Long-Horizon Tasks

A reliable evaluation system categorizes why the agent fails.

Planned report: Coding Agent Failure Taxonomy

planned

December 2026

Safety / Red Teaming / Refusal Quality

The model's behavior is measured on both risky and harmless requests.

Planned report: Refusal and Jailbreak Evaluation

planned

January 2027

Reasoning Behavior + Process Evaluation

The system can show flawed processes despite correct answers, or identify breakpoints leading to wrong answers.

Planned report: Outcome vs Process Evaluation

planned

February 2027

Monitorability / Interpretability Start

Behavior changes can be tracked not only from outputs but also from model-internal measurements.

Planned report: Failure Prediction Probes

planned

March 2027

Multimodal / UI Understanding / Computer Use

The model's ability to understand visual interfaces is measured task-by-task.

Planned report: UI Understanding Evaluation

planned

April 2027

Training / Inference Systems Efficiency

The main bottlenecks slowing model research infrastructure can be measured and improved.

Planned report: Open-Model Systems Bottlenecks

planned

May 2027

Data Efficiency + Scaling Ladder

The effect of data quality on behavior and compute efficiency is quantified.

Planned report: Score per GPU-Hour

planned

June 2027

Final Integration + Public Portfolio

An open-source project demonstrates an end-to-end open-model research-engineering loop.

Planned report: Final Technical Report

planned