Open Model Lab
Datasets
Dataset pages will document purpose, source, license, filtering, deduplication, leakage checks, risks, and versions before a dataset is used in public claims.
Purpose
The project needs dataset cards because behavior changes cannot be interpreted without knowing which data was used for evals, SFT, DPO, agent tasks, or safety checks.
Eval leakage control is mandatory: training data must not contain eval tasks.
Planned datasets
- eval_tasks_v1
- instruction_v1
- instruction_v2_filtered
- preference_v1
- agent_tasks_v1
- safety_eval_v1
Dataset card fields
- name
- purpose
- source
- license/usage constraints
- size
- categories
- filtering steps
- deduplication
- contamination/leakage checks
- risks
- version