Started: Jun 7, 2026
Updated: Jun 28, 2026

Current status

Public preview hardening

The twelve-disease 120-association public preview is published with integrity checks, methods material, visual QA, and a public reproducibility subset.

Latest public artifact: top120-public-preview-v0.1.3 package ↗ Jun 28, 2026
Next checkpoint: Complete public access evidence, source freshness/freeze checks, existing-resource comparison, and the archive decision.

Roadmap9 / 10 — 91% · effort-weighted

✓Define the public resource boundary: observed source evidence, not clinical guidance or complete biomedical knowledge
✓Publish the first tagged public-candidate data releases with checksums, archives, data dictionary, and rendered pages
✓Render the independent resource at /gene-disease-evidence/ with gene pages, methods, source caveats, and searchable associations
✓Run a five-disease public-candidate pilot and verify public raw data, GitHub assets, reader review, and canonical render sync
✓Generate the twelve-disease 120-association scale candidate — The released public preview aggregate reaches 12 disease scopes, 1,138 unique genes, 1,440 associations, and 1,505 evidence rows.
✓Freeze the public preview checklist and reporting contract — Gates cover source freshness, source silence/disagreement rollups, sampled-review packet, error taxonomy, clinical-language boundaries, archive checksums, and internal advisory review reports.
✓Stabilize the 120-association candidate as a review-gated archive — The 2026-06-14 aggregate passed sampled human review, error taxonomy, source freshness acknowledgement, clinical-language boundary scan, and archive verification.
✓Publish the twelve-disease aggregate public preview package with integrity manifest and archive
✓Prepare methods tables, figure inputs, static SVG visual QA, and a public reproducibility subset from the fixed preview snapshot
Complete final public access evidence, existing-resource comparison, source freshness/freeze checks, and DOI/no-DOI archive decision

What this is

Gene-Disease Evidence Index is a research-use public resource for browsing gene-disease associations observed in included public sources. It is built around traceable records: genes, diseases, association ids, evidence rows, source metadata, release status, review artifacts, and caveats.

The current public resource is a preview, not a clinical product. It is designed to make source-attributed evidence inspectable without turning source signals into diagnosis, treatment guidance, patient risk prediction, clinical decision support, or claims of biological certainty.

This project sits at the overlap of my software architecture work, LLM tooling, and computational biology and bioinformatics background.

Current public preview

The rendered preview has moved from the earlier five-disease public-candidate pilot to a twelve-disease 120-association public preview aggregate:

Alzheimer disease
Parkinson disease
Type 2 diabetes mellitus
Breast carcinoma
Dilated cardiomyopathy
Amyotrophic lateral sclerosis
Huntington disease
Cystic fibrosis
Schizophrenia
Epilepsy
Inflammatory bowel disease
Hypertrophic cardiomyopathy

As of 2026-06-28, the public preview contains about 1,138 unique genes, 1,440 associations, 1,505 evidence rows, and 24 source snapshots across the twelve disease releases. The aggregate package top120-public-preview-v0.1.3 has an integrity manifest, archive checksum, aligned citation metadata, sampled human review, internal advisory review records, generated methods tables, figure inputs, static SVG visual QA, and a public reproducibility subset.

What makes it useful

The intended contribution is not just a list of gene-disease pairs. The resource tracks how each association entered the release: which included source reported it, which source version was used, whether other included sources were silent, what caveats apply, and whether review artifacts exist.

That discipline matters because a simple table can easily overstate evidence. This project treats source provenance, source silence, review status, freshness, and release integrity as part of the data model rather than as afterthoughts.

Next steps

The live serkan.ai resource can change as the index improves. The current fixed public snapshot is top120-public-preview-v0.1.3, a twelve-disease research-use aggregate with public raw data, an integrity manifest, archive checksum, citation metadata, limitations, sampled human review, and public reproducibility material.

Next planning target: complete the remaining public-hardening checks around no-login access evidence, existing-resource comparison, source freshness/freeze review, and the DOI/no-DOI archive decision.

Boundaries

Not a complete database of all gene-disease knowledge
Not a causality claim
Not medical advice
Not diagnosis or treatment guidance
Not patient-risk prediction
Not clinical decision support
Not based on patient data

Status

Building. The 120-association public preview aggregate is published, integrity-checked, and backed by methods tables, generated figure artifacts, visual QA, and a public reproducibility subset. The next work is public-hardening and archive/freeze evidence, not clinical validation.

pythonbiomedical-dataevidence-indexingpublic-datasource-provenance

gene-disease evidence-index biomedical-ai public-data research-use