Open Model Lab

Decisions

A public decision log for scope, naming, eval-first sequencing, and claim boundaries.

Initial decision log

D001

Use "Open Model Research Harness" as the public project name

Decision: Use Open Model Research Harness for the project and Open Model Lab for the public section.

Rationale: The project borrows from rigorous research workflow patterns, but the public name should not imply that this is itself a frontier AI lab.

Consequence: Public routes, tags, titles, and navigation should use open-model language, not frontier-lab branding.

recorded
D002

Eval-first before model modification

Decision: The first month focuses on eval infrastructure before SFT, DPO, agents, or safety work.

Rationale: Without reproducible evals, every later model change is anecdotal.

Consequence: July is successful only when multiple models can be compared on the same tasks with score, cost, latency, and failure-mode reporting.

recorded
D003

No unsupported leaderboard

Decision: The site will not publish a leaderboard without real, reproducible runs.

Rationale: The project is a research-engineering portfolio, not benchmark marketing.

Consequence: Dashboards and reports remain planned/empty until configs, runs, and reports exist.

recorded
D004

Claim boundaries on every report

Decision: Every report must state what it does and does not claim.

Rationale: Small experiments are valuable, but overclaiming makes them less credible.

Consequence: All planned report pages include a claim-boundary section.

recorded
D005

Static-first public reporting

Decision: The public site should be static/content-driven unless actual run data requires otherwise.

Rationale: The site should be easy to maintain and hard to break.

Consequence: Initial pages are generated from structured content, not a database.

recorded