Starting Open Model Research Harness

I am starting a 12-month public research-engineering project called Open Model Research Harness.

The goal is not to claim frontier-model capability. The goal is to build a small, reproducible, public workflow for studying how open-model behavior changes across evaluation, supervised fine-tuning, preference optimization, tool use, safety constraints, monitorability probes, and systems-efficiency work.

The first month is intentionally eval-first. Before changing a model, I want to measure it reliably: same task suite, same graders, same reporting structure, and visible failure modes. A useful result is not just a score. It is a report that says what improved, what broke, what the setup can claim, and what it cannot claim.

The project will live in the Open Model Lab section of this site. Planned pages and empty dashboards are scaffolding for now. Results will only be marked as published after the underlying run, config, dataset, and report exist.