Published December 30, 2016 | Version v1
Journal article Open

Winner's Curse Correction and Variable Thresholding Improve Performance of Polygenic Risk Modeling Based on Genome-Wide Association Study Summary-Level Data

  • 1. National Cancer Institute
  • 2. Dongguk University
  • 3. University of Chicago
  • 4. Northern Illinois University
  • 5. Information Management Services, Inc.
  • 6. National Health Research Institutes
  • 7. University of Southern California
  • 8. Spanish National Cancer Research Centre
  • 9. Dartmouth College
  • 10. Imperial College London
  • 11. National Institute of Cancer Research

Description

Recent heritability analyses have indicated that genome-wide association studies (GWAS) have the potential to improve genetic risk prediction for complex diseases based on polygenic risk score (PRS), a simple modelling technique that can be implemented using summary-level data from the discovery samples. We herein propose modifications to improve the performance of PRS. We introduce threshold-dependent winner's-curse adjustments for marginal association coefficients that are used to weight the single-nucleotide polymorphisms (SNPs) in PRS. Further, as a way to incorporate external functional/annotation knowledge that could identify subsets of SNPs highly enriched for associations, we propose variable thresholds for SNPs selection. We applied our methods to GWAS summary-level data of 14 complex diseases. Across all diseases, a simple winner's curse correction uniformly led to enhancement of performance of the models, whereas incorporation of functional SNPs was beneficial only for selected diseases. Compared to the standard PRS algorithm, the proposed methods in combination led to notable gain in efficiency (25–50% increase in the prediction R2) for 5 of 14 diseases. As an example, for GWAS of type 2 diabetes, winner's curse correction improved prediction R2 from 2.29% based on the standard PRS to 3.10% (P = 0.0017) and incorporating functional annotation data further improved R2 to 3.53% (P = 2×10−5). Our simulation studies illustrate why differential treatment of certain categories of functional SNPs, even when shown to be highly enriched for GWAS-heritability, does not lead to proportionate improvement in genetic risk-prediction because of non-uniform linkage disequilibrium structure.

Data availability

The GWAS genotype data are not publicly available for the purpose of protecting patient privacy. Summary-level data or genotype data can be applied for from DbGaP or specific GWAS consortium. Access to WTCCC data is available by application to the Wellcome Trust Case Control Consortium Data Access Committee following the link https://www.sanger.ac.uk/legal/DAA/MasterController. Access to the GWAS of pancreatic cancer can be applied for through the PanC4 consortium (Email: eduell@iconcologia.net; Website: www.panc4.org). Access to the colorectal cancer GWAS data can be applied for through GECCO Consortium (Genetics and Epidemiology of Colorectal Cancer Consortium) (Dr. Ulrike Peters, Member Fred Hutchinson Cancer Research Center. Email: upeters@fhcrc.org). Summary level data for European lung cancer can be applied for from the TRICL consortium (Transdisciplinary Research in Cancer of the Lung) (Dr. Christopher I Amos, Norris Cotton Cancer Center, Dartmouth College. Email: Christopher.I.Amos@dartmouth.edu). Summary level data for prostate cancer GWAS can be applied for from the PRACTICAL consortium (Prostate Cancer Association Group to Investigate Cancer Associated Alterations in the Genome. Website: http://practical.ccge.medschl.cam.ac.uk/) and the GAME-ON/ELLIPSE consortium (Elucidating Loci Involved in Prostate Cancer Susceptibility. Website: http://epi.grants.cancer.gov/gameon/index.html). Access to the following GWAS individual-level data can be applied for through the dbGaP website (https://www.ncbi.nlm.nih.gov/gap): Female Lung Cancer Consortium in Asia (FLCCA), phs000716.v1.p1; bladder cancer, phs000346.v1.p1; Molecular Genetics of Schizophrenia, phs000167.v1.p1; Genetic Epidemiology Research on Adult Health and Aging (GERA), phs000674.v1.p1; Lung cancer GWAS in EAGLE (Environment and Genetics in Lung Cancer Etiology Study), phs000093.v2.p2.

Files

journal.pgen.1006493.pdf

Files (3.3 MB)

Name Size Download all
Article
md5:7a67c59bf4b3b0f522f1ed70b9f349af
2.9 MB Preview Download
md5:e1d0cb562f6e50a4cc26d91ee0bde765
472.6 kB Preview Download

Additional details

Identifiers

DOI
10.1371/journal.pgen.1006493
Other
oai:uchicago.tind.io:6760

Funding

National Institutes of Health
Intramural Research program
National Institutes of Health
U19 CA148127
National Cancer Institute
U01 CA137088
National Cancer Institute
R01 CA059045
Regional Council of Pays de la Loire
Groupement des Entreprises Françaises dans la Lutte contre le Cancer
Association Anne de Bretagne Génétique and the Ligue Régionale Contre le Cancer
National Institutes of Health
R01 CA60987
German Research Council
BR 1704/6-1
German Research Council
BR 1704/6-3
German Research Council
BR 1704/6-4
German Research Council
CH 117/1-1
German Federal Ministry of Education and Research
01KH0404
German Federal Ministry of Education and Research
01ER0814
National Institutes of Health
R01 CA48998
National Institutes of Health
P01 CA 055075
National Institutes of Health
UM1 CA167552
National Institutes of Health
R01 137178
National Institutes of Health
R01 CA151993
National Institutes of Health
P50 CA127003
National Institutes of Health
UM1 CA186107
National Institutes of Health
R01 CA137178
National Institutes of Health
P01 CA87969
National Institutes of Health
R01 CA151993
National Institutes of Health
P50 CA127003
National Institutes of Health
R01 CA042182
National Institutes of Health
R37 CA54281
National Institutes of Health
P01 CA033619
National Institutes of Health
R01 CA63464
National Institutes of Health
U01 CA074783
Ontario Research Fund
Canadian Institutes of Health Research
Ontario Institute for Cancer Research
Ontario Ministry of Research and Innovation
National Institutes of Health
R01 CA076366
National Institutes of Health
K05 CA154337
National Heart, Lung, and Blood Institute
HHSN268201100046C
National Heart, Lung, and Blood Institute
HHSN268201100001C
National Heart, Lung, and Blood Institute
HHSN268201100002C
National Heart, Lung, and Blood Institute
HHSN268201100003C
National Heart, Lung, and Blood Institute
HHSN268201100004C
National Heart, Lung, and Blood Institute
HHSN271201100004C

UChicago Information

Division(s)
Pritzker School of Medicine