Open Model Lab

Dashboards

Dashboard pages are planned scaffolding. Charts will only appear after the underlying data is available and tied to published runs.

Status

Planned dashboard index. There are no public charts yet because there are no public runs yet.

Planned dashboards

planned

Score / Cost / Latency

Will visualize: Per-run score, cost, and latency with task-suite and model context.

Required data: Published run records with model, eval suite, score, cost, and latency fields.

planned

Failure Mode Distribution

Will visualize: Failure-mode counts by model variant, task category, and report.

Required data: Runs with validated failure-mode labels.

planned

Base vs SFT vs DPO Comparison

Will visualize: Controlled behavior changes across base, SFT, and DPO variants.

Required data: Comparable runs on the same task suite and documented model cards.

planned

Agent Traces

Will visualize: Step-by-step tool use, retries, errors, and task outcomes.

Required data: Replayable trace records from the agent harness.

planned

Safety / Helpfulness Trade-off

Will visualize: Refusal quality, over-refusal, under-refusal, jailbreak robustness, and helpfulness.

Required data: Safety eval runs over both risky and harmless requests.

planned

Score per GPU-Hour

Will visualize: Behavior gain compared with training or inference compute cost.

Required data: Scaling-ladder runs with compute accounting and comparable score fields.