Open Model Lab

Dashboards

Dashboard pages are planned scaffolding. Charts will only appear after the underlying data is available and tied to published runs.

Status

Planned dashboard index. There are no public charts yet because there are no public runs yet.

planned

Will visualize: Per-run score, cost, and latency with task-suite and model context.

Required data: Published run records with model, eval suite, score, cost, and latency fields.

planned

Will visualize: Failure-mode counts by model variant, task category, and report.

Required data: Runs with validated failure-mode labels.

planned

Will visualize: Controlled behavior changes across base, SFT, and DPO variants.

Required data: Comparable runs on the same task suite and documented model cards.

planned

Will visualize: Step-by-step tool use, retries, errors, and task outcomes.

Required data: Replayable trace records from the agent harness.

planned

Will visualize: Refusal quality, over-refusal, under-refusal, jailbreak robustness, and helpfulness.

Required data: Safety eval runs over both risky and harmless requests.

planned

Will visualize: Behavior gain compared with training or inference compute cost.

Required data: Scaling-ladder runs with compute accounting and comparable score fields.