Academic Jobs - Home of Higher Ed Logo

Nature Publishes Phenome-Wide Study of Copy Number Variants in 470,727 UK Biobank Genomes

456views
Submit News
Majestic waterfall cascades down green, mossy cliffs under blue sky.
Photo by Lucas Doddema on Unsplash

Unlocking the Genome: CNVs and Their Broad Impact

In a landmark publication dated February 4, 2026, researchers unveiled a comprehensive phenome-wide analysis of copy number variants (CNVs) across 470,727 whole-genome sequences from the UK Biobank cohort. This study, featured prominently in Nature, represents a significant advancement in understanding how structural variations in DNA influence a vast array of human traits and diseases.

Copy number variants are segments of DNA where the number of copies differs from the typical two (one inherited from each parent). These can be deletions (fewer copies) or duplications (extra copies), ranging from thousands to millions of base pairs. Unlike single nucleotide polymorphisms (SNPs), which alter single bases, CNVs affect larger genomic regions and thus have potentially greater functional impacts, including gene dosage changes and regulatory disruptions.

The UK Biobank, a treasure trove of biomedical data from half a million UK participants aged 40-69 recruited between 2006 and 2010, provides the scale needed for such analyses. Its whole-genome sequencing (WGS) data, released progressively, enables unprecedented resolution in detecting rare variants like CNVs.

Methodology: Precision in Detecting and Analyzing CNVs

The team employed DRAGEN v.3.7.8 software to call germline CNVs from autosomes, focusing on those larger than 10 kb to ensure reliability. Rigorous quality control excluded low-coverage samples, contamination, and regions prone to errors like segmental duplications. Post-QC, 102,717 unique deletions and 80,147 duplications remained, mostly rare (99.8% <1% frequency).

  • Sample filtering: Removed consent withdrawals, aneuploidies, and batch effects.
  • Variant QC: QUAL score >35, merged overlaps, validated via parent-child trios (4.1% Mendelian violations).
  • PheWAS models: Dominant/recessive for deletions/duplications, tested against 2,941 plasma proteins (49,736 individuals), 13,336 binary phenotypes, and 1,911 quantitative traits.

Significance was set at P < 10-8. Multiancestry meta-analyses spanned six groups: non-Finnish European (NFE, 94.77%), African (AFR), East Asian (EAS), South Asian (SAS), etc., revealing ancestry-specific signals.

Pipeline for detecting copy number variants in UK Biobank genomes

Proteomic Revelations: Protein Levels and Interactions

Proteomic PheWAS validated cis-effects: deletions typically lowered nearby protein levels, duplications raised them. 142 rare and 175 common CNV-protein associations emerged, including trans-pQTLs hinting at protein-protein interactions. For instance, certain CNVs influenced distant proteins, suggesting novel pathways.

These findings underscore CNVs' role in gene regulation, beyond mere coding disruptions.

Clinical Phenotypes: From Rare Diseases to Common Conditions

189 CNV-binary phenotype links were identified, hotspots at 16p11.2 (neurological issues, obesity), 17p12 (Charcot-Marie-Tooth disease via PMP22 duplication, OR 1,324), and 21p11.2. Examples include HNF1B duplication with chronic renal failure (OR 5.29) and NME7 deletion protecting against thrombophilia (OR 0.30).

892 quantitative trait associations covered body measures, blood biomarkers, and more.

Novel Discoveries Highlighting CNV Diversity

Standouts: A rare ZNF451 deletion boosted leukocyte telomere length, potentially anti-aging. A SLC2A9 enhancer deletion cut gout risk (OR 0.80), pointing to uric acid pathways. PDZK1 duplication linked to gout and high urate. These non-coding effects expand CNV influence.

Gene-level burden tests aggregated CNVs per gene, uncovering MSH2-colorectal cancer (OR 192).

person holding book

Photo by David Emrich on Unsplash

Boosting Power with Multi-Omics Integration

Combining CNVs with protein-truncating variants (PTVs) yielded 2,274 binary and 2,965 quantitative associations, clarifying causality (e.g., HBB in thalassemia). This approach detects dosage-sensitive genes missed by SNPs alone, ideal for inhibitor targets like duplications in disease.

Read the full Nature study

Multiancestry Insights: Beyond European-Centric Views

Meta-analyses found 12 binary and 175 quantitative hits unique to non-NFE, like sickle-cell (AFR) and α-thalassemia (EAS). This addresses underrepresentation, vital for equitable genomics.

Implications for Precision Medicine and Drug Discovery

CNVs offer biomarkers (e.g., TMPRSS5 for CMT1A) and targets: protective deletions for loss-of-function therapies, duplications for inhibitors. The dataset is a resource for therapeutics, enhancing polygenic risk models.

Stakeholders: Clinicians gain diagnostic tools; pharma identifies candidates; policymakers see value in biobanks.

UK Higher Education's Role in Genomic Frontiers

Many authors hail from AstraZeneca's Cambridge centre, affiliated with University of Cambridge (e.g., Haematology Department). UK Biobank fosters collaborations with unis like Manchester and Oxford. This study exemplifies UK leadership in genomics, spurring PhD/postdoc opportunities in research jobs and faculty positions via higher-ed jobs.

Students and academics can leverage such data for theses on structural variants, advancing careers in bioinformatics and medicine.

UK university researchers collaborating on genomics studies

Challenges, Limitations, and Future Outlook

  • Autosomes only; smaller CNVs missed.
  • Passenger effects in multi-gene CNVs.
  • Need for functional validation.

Future: Integrate with single-cell data, expand ancestries. UK initiatives like Genomics England will build on this.

For aspiring researchers, explore higher ed career advice or browse university jobs in genetics.

A close up of moss growing on a rock

Photo by Naoki Suzuki on Unsplash

UK Biobank resource

Conclusion: A New Era in Human Genetics

This PheWAS cements CNVs' role in health, empowering precision medicine. UK academics drive innovation—check higher-ed jobs, research jobs, and career advice to join. Share insights in comments below.

Portrait of Gabrielle Ryan
About the author

Gabrielle RyanView author

Academic Jobs In House Author

Acknowledgements:

Discussion

Sort by:

Be the first to comment on this article!

You

Please keep comments respectful and on-topic.

New0 comments

Join the conversation!

Add your comments now!

Have your say

Engagement level

Browse by Faculty

Browse by Subject

Frequently Asked Questions

🧬What are copy number variants (CNVs)?

CNVs are DNA segments with copy number differences from the diploid norm, like deletions or duplications, impacting gene dosage and regulation. Explained fully in the study.Academic CV tips

🔢How many genomes were analyzed in the UK Biobank CNV study?

470,727 whole-genome sequences underwent QC, yielding insights across ancestries.

⚙️What key methods detected CNVs?

DRAGEN software on WGS data, >10kb threshold, PheWAS with dominant/recessive models.

💡What novel CNV-trait associations were found?

ZNF451 deletion with longer telomeres; SLC2A9 deletion reduces gout risk; PMP22 dup. for CMT1A.

🌍How does multiancestry analysis add value?

Reveals ancestry-specific signals like sickle-cell in AFR, improving global applicability.

🔗What are cis and trans pQTLs from the study?

Cis: CNVs affect nearby proteins; trans: distant interactions, validated via proteomics.

💊Implications for drug discovery?

Duplications as inhibitor targets (e.g., PDZK1 for gout); biomarkers like TMPRSS5.

🎓Role of UK universities in this research?

Affiliations with University of Cambridge; UK Biobank enables uni collaborations. See UK jobs.

⚠️Limitations of the CNV PheWAS?

Autosomes only; >10kb CNVs; potential multi-gene effects.

🚀Future directions post-study?

Integrate with single-cell, expand ancestries; fuel precision medicine. Explore research jobs.

📊How to access UK Biobank data?

Via approved researcher access; vital for genomics careers.UK Biobank