Published February 24, 2022 | Version v1
Journal article Open

Protein prediction for trait mapping in diverse populations

  • 1. Loyola University Chicago
  • 2. Broad Institute
  • 3. Beth Israel Deaconess Medical Center
  • 4. University of Southern California
  • 5. University of California Los Angeles Medical Center
  • 6. University of Vermont
  • 7. University of Washington
  • 8. Duke University
  • 9. University of Michigan
  • 10. National Heart, Lung and Blood Institute
  • 11. University of Chicago

Description

Genetically regulated gene expression has helped elucidate the biological mechanisms underlying complex traits. Improved high-throughput technology allows similar interrogation of the genetically regulated proteome for understanding complex trait mechanisms. Here, we used the Trans-omics for Precision Medicine (TOPMed) Multi-omics pilot study, which comprises data from Multi-Ethnic Study of Atherosclerosis (MESA), to optimize genetic predictors of the plasma proteome for genetically regulated proteome-wide association studies (PWAS) in diverse populations. We built predictive models for protein abundances using data collected in TOPMed MESA, for which we have measured 1,305 proteins by a SOMAscan assay. We compared predictive models built via elastic net regression to models integrating posterior inclusion probabilities estimated by fine-mapping SNPs prior to elastic net. In order to investigate the transferability of predictive models across ancestries, we built protein prediction models in all four of the TOPMed MESA populations, African American (n = 183), Chinese (n = 71), European (n = 416), and Hispanic/Latino (n = 301), as well as in all populations combined. As expected, fine-mapping produced more significant protein prediction models, especially in African ancestries populations, potentially increasing opportunity for discovery. When we tested our TOPMed MESA models in the independent European INTERVAL study, fine-mapping improved cross-ancestries prediction for some proteins. Using GWAS summary statistics from the Population Architecture using Genomics and Epidemiology (PAGE) study, which comprises ∼50,000 Hispanic/Latinos, African Americans, Asians, Native Hawaiians, and Native Americans, we applied S-PrediXcan to perform PWAS for 28 complex traits. The most protein-trait associations were discovered, colocalized, and replicated in large independent GWAS using proteome prediction model training populations with similar ancestries to PAGE. At current training population sample sizes, performance between baseline and fine-mapped protein prediction models in PWAS was similar, highlighting the utility of elastic net. Our predictive models in diverse populations are publicly available for use in proteome mapping methods at https://doi.org/10.5281/zenodo.4837327.

Data availability

Models presented in the main text are available at https://doi.org/10.5281/zenodo.4837327. Code for supporting figures and analysis are available at https://github.com/RyanSchu/TOPMed_protein_prediction. UKB GWAS summary statistics can be accessed via http://www.nealelab.is/uk-biobank/. All other large European GWAS can be accessed through the GWAS catalog. A list of studies can be found in S2 Table. Data from INTERVAL is under controlled access via the European Genome-phenome Archive hhttps://ega-archive.org/ for both genotypes (EGAD00010001544) and 638 blood plasma aptamers levels as measured by a SOMAscan assay (EGAD00001004080). PAGE GWAS summary statistics are available in the GWAS Catalog at https://www.ebi.ac.uk/gwas/publications/31217584. MESA data are under controlled access in dbGaP at https://www.ncbi.nlm.nih.gov/gap/. Genotypes are available through accession phs000420.v6.p3 and protein data will be available through accession phs001416.v2.p1.

Files

journal.pone.0264341.pdf

Files (9.4 MB)

Name Size Download all
md5:1e9b3b79595e7246246dd8fa4f0bdba8
6.9 MB Preview Download
Article
md5:2499ed601705742130a0cc03678164c4
2.5 MB Preview Download

Additional details

Identifiers

DOI
10.1371/journal.pone.0264341
Other
oai:uchicago.tind.io:6313

Funding

National Heart, Lung, and Blood Institute
75N92020D00001
National Heart, Lung, and Blood Institute
HHSN268201500003I
National Heart, Lung, and Blood Institute
N01-HC-95159
National Heart, Lung, and Blood Institute
75N92020D00005
National Heart, Lung, and Blood Institute
N01-HC-95160
National Heart, Lung, and Blood Institute
75N92020D00002
National Heart, Lung, and Blood Institute
N01-HC-95161
National Heart, Lung, and Blood Institute
75N92020D00003
National Heart, Lung, and Blood Institute
N01-HC-95162
National Heart, Lung, and Blood Institute
75N92020D00006
National Heart, Lung, and Blood Institute
N01-HC-95163
National Heart, Lung, and Blood Institute
75N92020D00004
National Heart, Lung, and Blood Institute
N01-HC-95164
National Heart, Lung, and Blood Institute
75N92020D00007
National Heart, Lung, and Blood Institute
N01-HC-95165
National Heart, Lung, and Blood Institute
N01-HC-95166
National Heart, Lung, and Blood Institute
N01-HC-95167
National Heart, Lung, and Blood Institute
N01-HC-95168
National Heart, Lung, and Blood Institute
N01-HC-95169
National Heart, Lung, and Blood Institute
UL1-TR-000040
National Heart, Lung, and Blood Institute
UL1-TR-001079
National Heart, Lung, and Blood Institute
UL1-TR-001420
National Heart, Lung, and Blood Institute
N02-HL-64278
National Heart, Lung, and Blood Institute
R01HL071051
National Heart, Lung, and Blood Institute
R01HL071205
National Heart, Lung, and Blood Institute
R01HL071250
National Heart, Lung, and Blood Institute
R01HL071251
National Heart, Lung, and Blood Institute
R01HL071258
National Heart, Lung, and Blood Institute
R01HL071259
National Center for Research Resources
UL1RR033176
National Center for Advancing Translational Sciences
UL1TR001881
National Institute of Diabetes and Digestive and Kidney Disease Diabetes Research Center
DK063491
Unknown funder
HHSN2682015000031
Unknown funder
HHSN26800004
Unknown funder
HHSN268201600034I

UChicago Information

Division(s)
Biological Sciences Division
Department(s)
Genetics, Genomics, and Systems Biology