1. Introduction: The Detective Work of Curation
Variant curation is, at its core, an exercise in information retrieval and synthesis. The curator acts as a detective, gathering clues from disparate sources to build a case for or against pathogenicity. In the modern era, this does not mean flipping through textbooks; it means navigating a complex ecosystem of interconnected databases and computational tools.
Figure 1: The Evidence Ecosystem
The Curator
Synthesizes evidence using ACMG rules to assign classification.
2. Population Databases
We covered this extensively in our dedicated post, but to recap their role in the ACMG framework:
- gnomAD (v2/v3/v4): The primary source for allele frequency. Used for BA1 (>5%), BS1 (higher than expected), and PM2 (absent).
- All of Us: Critical for non-European ancestry to avoid false positives (the Manrai artifact).
3. Disease Databases
3.1. ClinVar: The Gold Standard
ClinVar is the NCBI's public archive of reports of relationships between human variations and phenotypes. It is not a curated database itself; it is a repository of submissions from labs.
Understanding the Star System
- ⭐⭐⭐⭐ Practice Guideline: Expert panel consensus (e.g., ACMG, ClinGen). Treat as fact.
- ⭐⭐⭐ Expert Panel: Reviewed by a specific expert group. High confidence.
- ⭐⭐ Criteria Provided, Multiple Submitters, No Conflict: Reliable.
- ⭐ Criteria Provided, Single Submitter: Use with caution.
- (No Star) No Criteria / Conflicting Interpretations: Do not trust blindly.
3.2. HGMD (Human Gene Mutation Database)
HGMD is a subscription-based database that indexes published variants. While comprehensive, it is known for "pollution" with older literature where variants were claimed to be pathogenic without modern evidence standards. Rule of thumb: Use HGMD to find the paper, but read the paper yourself to verify the claim.
4. Locus Specific Databases (LSDBs)
For specific genes, general databases aren't enough. LSDBs are curated by experts in that specific gene/disease.
- LOVD (Leiden Open Variation Database): Common for many genes.
- BRCA Exchange: The definitive resource for BRCA1/2.
- InSiGHT: For mismatch repair genes (Lynch Syndrome).
5. In-Silico Tools (PP3/BP4)
Computational prediction has evolved from simple conservation scores to deep learning models.
The Old Guard
SIFT, PolyPhen-2. Based largely on evolutionary conservation and amino acid properties. High false positive rates.
The New Standard
REVEL: An ensemble score combining multiple tools. Superior performance for missense variants.
SpliceAI: A deep neural network that predicts splicing changes with high accuracy, often outperforming traditional MaxEntScan.
6. Literature Search
Finding the paper is half the battle.
- PubMed: The classic. Use boolean operators (e.g., "Gene AND (Variant OR rsID)").
- Mastermind (Genomenon): A genomic search engine that indexes full-text articles. It is far superior to PubMed for finding variants buried in supplementary tables.
7. Automation & APIs
Manual curation is unscalable. Modern bioinformatics pipelines automate the evidence gathering phase:
# Example: Automating ClinVar lookup (Python)
import requests
def get_clinvar_summary(variant_id):
url = f"https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esummary.fcgi?db=clinvar&id={variant_id}&retmode=json"
response = requests.get(url)
return response.json()