A genomic mutational constraint map using variation in 76,156 human genomes

Publikation: Bidrag til tidsskriftTidsskriftartikelForskningfagfællebedømt

  • Siwei Chen
  • Laurent C. Francioli
  • Julia K. Goodrich
  • Ryan L. Collins
  • Masahiro Kanai
  • Qingbo Wang
  • Jessica Alföldi
  • Nicholas A. Watts
  • Christopher Vittal
  • Laura D. Gauthier
  • Timothy Poterba
  • Michael W. Wilson
  • Yekaterina Tarasova
  • William Phu
  • Riley Grant
  • Mary T. Yohannes
  • Zan Koenig
  • Yossi Farjoun
  • Eric Banks
  • Stacey Donnelly
  • Stacey Gabriel
  • Namrata Gupta
  • Steven Ferriera
  • Charlotte Tolonen
  • Sam Novod
  • Louis Bergelson
  • David Roazen
  • Valentin Ruano-Rubio
  • Miguel Covarrubias
  • Christopher Llanwarne
  • Nikelle Petrillo
  • Gordon Wade
  • Thibault Jeandet
  • Ruchi Munshi
  • Kathleen Tibbetts
  • Loos, Ruth
  • Konrad J. Karczewski
  • Genome Aggregation Database Consortium
The depletion of disruptive variation caused by purifying natural selection (constraint) has been widely used to investigate protein-coding genes underlying
human disorders1–4, but attempts to assess constraint for non-protein-coding regions have proved more difcult. Here we aggregate, process and release a dataset of 76,156 human genomes from the Genome Aggregation Database (gnomAD)—the largest public open-access human genome allele frequency reference dataset—and use it to build a genomic constraint map for the whole genome (genomic non-coding constraint of haploinsufcient variation (Gnocchi)). We present a refned mutational model that incorporates local sequence context and regional genomic features to detect depletions of variation. As expected, the average constraint for protein-coding sequences is stronger than that for non-coding regions. Within the non-coding genome, constrained regions are enriched for known regulatory elements and variants that are implicated in complex human diseases and traits, facilitating the triangulation of biological annotation, disease association and natural selection to non-coding DNA analysis. More constrained regulatory elements tend to regulate more constrained protein-coding genes, which in turn suggests that non-coding constraint can aid the
identifcation of constrained genes that are as yet unrecognized by current gene
constraint metrics. We demonstrate that this genome-wide constraint map improves the identifcation and interpretation of functional human genetic variation.
OriginalsprogEngelsk
TidsskriftNature
Vol/bind625
Udgave nummer7993
Sider (fra-til)92-100
Antal sider9
ISSN0028-0836
DOI
StatusUdgivet - 2024

Bibliografisk note

Funding Information:
The authors thank the individuals whose data is in gnomAD for their contributions to research. Development of the Genome Aggregation Database was supported by NIDDK U54DK105566 and the NHGRI of the National Institutes of Health under award number U24HG011450. Additional funding for Genome Aggregation Database Consortium members is listed in the . The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.

Publisher Copyright:
© 2023, The Author(s), under exclusive licence to Springer Nature Limited.

ID: 380298732