Building Fresh

Gene-Disease Evidence Index

A research-use, versioned index of source-attributed gene-disease evidence with a twelve-disease 120-association public preview aggregate.

Actively developing. Not usable yet.

Started
Jun 7, 2026
Updated
Jun 14, 2026
Roadmap8 / 9 — 88% · effort-weighted
  • Define the public resource boundary: observed source evidence, not clinical guidance or complete biomedical knowledge
  • Publish the first tagged public-candidate data releases with checksums, archives, data dictionary, and rendered pages
  • Render the independent resource at /gene-disease-evidence/ with gene pages, methods, source caveats, and searchable associations
  • Run a five-disease public-candidate pilot and verify public raw data, GitHub assets, reader review, and canonical render sync
  • Generate the twelve-disease 120-association scale candidate — The released public preview aggregate reaches 12 disease scopes, 1,138 unique genes, 1,440 associations, and 1,505 evidence rows.
  • Freeze the public preview checklist and reporting contract — Gates cover source freshness, source silence/disagreement rollups, sampled-review packet, error taxonomy, clinical-language boundaries, archive checksums, and advisory referee reports.
  • Stabilize the 120-association candidate as a review-gated archive — The 2026-06-14 aggregate passed sampled human review, error taxonomy, source freshness acknowledgement, clinical-language boundary scan, and archive verification.
  • Publish the twelve-disease aggregate public preview package with integrity manifest and archive
  • Prepare formal analysis tables, limitations, methods text, and archive documentation from the fixed public preview snapshot

What this is

Gene-Disease Evidence Index is a research-use public resource for browsing gene-disease associations observed in included public sources. It is built around traceable records: genes, diseases, association ids, evidence rows, source metadata, release status, review artifacts, and caveats.

The current public resource is a preview, not a clinical product. It is designed to make source-attributed evidence inspectable without turning source signals into diagnosis, treatment guidance, patient risk prediction, clinical decision support, or claims of biological certainty.

This project sits at the overlap of my software architecture work, LLM tooling, and computational biology and bioinformatics background.

Current public preview

The rendered preview has moved from the earlier five-disease public-candidate pilot to a twelve-disease 120-association public preview aggregate:

  • Alzheimer disease
  • Parkinson disease
  • Type 2 diabetes mellitus
  • Breast carcinoma
  • Dilated cardiomyopathy
  • Amyotrophic lateral sclerosis
  • Huntington disease
  • Cystic fibrosis
  • Schizophrenia
  • Epilepsy
  • Inflammatory bowel disease
  • Hypertrophic cardiomyopathy

As of 2026-06-14, the public preview contains about 1,138 unique genes, 1,440 associations, 1,505 evidence rows, and 24 source snapshots across the twelve disease releases. The aggregate package top120-public-preview-v0.1.1 has an integrity manifest and archive checksum, with sampled human review and advisory referee gates recorded upstream.

What makes it useful

The intended contribution is not just a list of gene-disease pairs. The resource tracks how each association entered the release: which included source reported it, which source version was used, whether other included sources were silent, what caveats apply, and whether review artifacts exist.

That discipline matters because a simple table can easily overstate evidence. This project treats source provenance, source silence, review status, freshness, and release integrity as part of the data model rather than as afterthoughts.

Next steps

The live serkan.ai resource can change as the index improves. The current fixed public snapshot is top120-public-preview-v0.1.1, a twelve-disease research-use aggregate with public raw data, an integrity manifest, archive checksum, limitations, and sampled human review.

Next planning target: turn the fixed aggregate and gate reports into clearer analysis tables, limitations, methods text, and archive documentation. If a formal paper is published later, this page can link to it directly.

Boundaries

  • Not a complete database of all gene-disease knowledge
  • Not a causality claim
  • Not medical advice
  • Not diagnosis or treatment guidance
  • Not patient-risk prediction
  • Not clinical decision support
  • Not based on patient data

Status

Building. The 120-association public preview aggregate is now published and integrity-checked. The next work is formal analysis: evaluation tables, limitations, methods text, and archive documentation.

pythonbiomedical-dataevidence-indexingpublic-datasource-provenance