Skip to main content
In Development
🚀 Coming Soon! Seeking Testers

We're seeking organizational testers for early access to our AI-powered genomics platform.

Part 3 of 6
Variant Classification Series

Evidence Gathering: Navigating the Genomic Data Ecosystem

November 23, 2025
20 min read
RW

Ryan Wentzel

Founder & CEO, Humanome.AI

1. Introduction: The Detective Work of Curation

Variant curation is, at its core, an exercise in information retrieval and synthesis. The curator acts as a detective, gathering clues from disparate sources to build a case for or against pathogenicity. In the modern era, this does not mean flipping through textbooks; it means navigating a complex ecosystem of interconnected databases and computational tools.

Figure 1: The Evidence Ecosystem

Population DatagnomAD, All of Us
Clinical DataClinVar, HGMD
ComputationalSpliceAI, REVEL
The Curator

Synthesizes evidence using ACMG rules to assign classification.

2. Population Databases

We covered this extensively in our dedicated post, but to recap their role in the ACMG framework:

  • gnomAD (v2/v3/v4): The primary source for allele frequency. Used for BA1 (>5%), BS1 (higher than expected), and PM2 (absent).
  • All of Us: Critical for non-European ancestry to avoid false positives (the Manrai artifact).

3. Disease Databases

3.1. ClinVar: The Gold Standard

ClinVar is the NCBI's public archive of reports of relationships between human variations and phenotypes. It is not a curated database itself; it is a repository of submissions from labs.

Understanding the Star System

  • ⭐⭐⭐⭐ Practice Guideline: Expert panel consensus (e.g., ACMG, ClinGen). Treat as fact.
  • ⭐⭐⭐ Expert Panel: Reviewed by a specific expert group. High confidence.
  • ⭐⭐ Criteria Provided, Multiple Submitters, No Conflict: Reliable.
  • Criteria Provided, Single Submitter: Use with caution.
  • (No Star) No Criteria / Conflicting Interpretations: Do not trust blindly.

3.2. HGMD (Human Gene Mutation Database)

HGMD is a subscription-based database that indexes published variants. While comprehensive, it is known for "pollution" with older literature where variants were claimed to be pathogenic without modern evidence standards. Rule of thumb: Use HGMD to find the paper, but read the paper yourself to verify the claim.

4. Locus Specific Databases (LSDBs)

For specific genes, general databases aren't enough. LSDBs are curated by experts in that specific gene/disease.

  • LOVD (Leiden Open Variation Database): Common for many genes.
  • BRCA Exchange: The definitive resource for BRCA1/2.
  • InSiGHT: For mismatch repair genes (Lynch Syndrome).

5. In-Silico Tools (PP3/BP4)

Computational prediction has evolved from simple conservation scores to deep learning models.

The Old Guard

SIFT, PolyPhen-2. Based largely on evolutionary conservation and amino acid properties. High false positive rates.

The New Standard

REVEL: An ensemble score combining multiple tools. Superior performance for missense variants.

SpliceAI: A deep neural network that predicts splicing changes with high accuracy, often outperforming traditional MaxEntScan.

6. Literature Search

Finding the paper is half the battle.

  • PubMed: The classic. Use boolean operators (e.g., "Gene AND (Variant OR rsID)").
  • Mastermind (Genomenon): A genomic search engine that indexes full-text articles. It is far superior to PubMed for finding variants buried in supplementary tables.

7. Automation & APIs

Manual curation is unscalable. Modern bioinformatics pipelines automate the evidence gathering phase:

# Example: Automating ClinVar lookup (Python)

import requests

def get_clinvar_summary(variant_id):

  url = f"https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esummary.fcgi?db=clinvar&id={variant_id}&retmode=json"

  response = requests.get(url)

  return response.json()

Tags

ClinVar
gnomAD
Bioinformatics
Curation
Humanome.AI - Genomic Variant Intelligence Assistant