Published January 20, 2023 | Version v1
Journal article Open

Sequencing-based fine-mapping and in silico functional characterization of the 10q24.32 arsenic metabolism efficiency locus across multiple arsenic-exposed populations

  • 1. University of Chicago
  • 2. Missouri Breaks Industries Research Inc
  • 3. Texas Biomedical Research Institute
  • 4. University of Minnesota
  • 5. MedStar Health Research Institute
  • 6. Columbia University
  • 7. Dartmouth College

Description

Inorganic arsenic is highly toxic and carcinogenic to humans. Exposed individuals vary in their ability to metabolize arsenic, and variability in arsenic metabolism efficiency (AME) is associated with risks of arsenic-related toxicities. Inherited genetic variation in the 10q24.32 region, near the arsenic methyltransferase (AS3MT) gene, is associated with urine-based measures of AME in multiple arsenic-exposed populations. To identify potential causal variants in this region, we applied fine mapping approaches to targeted sequencing data generated for exposed individuals from Bangladeshi, American Indian, and European American populations (n = 2,357, 557, and 648 respectively). We identified three independent association signals for Bangladeshis, two for American Indians, and one for European Americans. The size of the confidence sets for each signal varied from 4 to 85 variants. There was one signal shared across all three populations, represented by the same SNP in American Indians and European Americans (rs191177668) and in strong linkage disequilibrium (LD) with a lead SNP in Bangladesh (rs145537350). Beyond this shared signal, differences in LD patterns, minor allele frequency (MAF) (e.g., rs12573221 ~13% in Bangladesh ~0.2% among American Indians), and/or heterogeneity in effect sizes across populations likely contributed to the apparent population specificity of the additional identified signals. One of our potential causal variants influences AS3MT expression and nearby DNA methylation in numerous GTEx tissue types (with rs4919690 as a likely causal variant). Several SNPs in our confidence sets overlap transcription factor binding sites and cis-regulatory elements (from ENCODE). Taken together, our analyses reveal multiple potential causal variants in the 10q24.32 region influencing AME, including a variant shared across populations, and elucidate potential biological mechanisms underlying the impact of genetic variation on AME.

Data availability

All summary statistics generated with this study are included within the manuscript and its supporting information files. Individual-level data requests for all the data underlying results presented in the study can be requested by email. Requests will then be routed to the three individual studies, as there are different mechanisms for data access for each study. Normalized expression matrices, summary statistics for eQTLs and mQTLs, and covariates used for QTL mapping are available at the GTEx Portal (https://gtexportal.org/home/datasets). DNAm normalized data is available at GEO (GSE213478); access to the DNAm raw data is provided through the AnVIL platform (https://anvil.terra.bio/#workspaces/anvil-datastorage/AnVIL_GTEx_V9_hg38). All GTEx protected data are available via dbGaP (phs000424.v9 and phs000424.v8.p2). GTEx whole genome sequencing data can be requested through dbGaP (https://gtexportal.org/home/protectedDataAccess).

Files

Sequencing-based-fine-mapping-and-in-silico-functional-characterization.pdf

Files (12.8 MB)

Name Size Download all
Article
md5:33c5ebf9a75681e311d02274c4325da7
2.8 MB Preview Download
Supporting information files
md5:c1f029560a10e5b90b3bad75c948a20d
10.1 MB Preview Download

Additional details

Identifiers

DOI
10.1371/journal.pgen.1010588
Other
oai:uchicago.tind.io:5456

Funding

National Institutes of Health
R01HL109315
National Institutes of Health
R01HL109301
National Institutes of Health
R01HL109284
National Institutes of Health
R01HL109282
National Institutes of Health
R01HL109319
National Institutes of Health
R01HL090863
National Institutes of Health
U01HL41642
National Institutes of Health
U01HL41652
National Institutes of Health
U01HL41654
National Institutes of Health
U01HL65520
National Institutes of Health
U01HL65521
National Institute of Environmental Health Sciences
P42ES033719
National Institute of Environmental Health Sciences
R01ES032638
National Institute of Environmental Health Sciences
R01ES021367
National Institute of Environmental Health Sciences
R35ES028379-03S1
National Institute of General Medicine
T73M007281
National Institute of Environmental Health Sciences
5F30ES031858-02
Susan G. Komen Breast Cancer Foundation
Research Training Grant
National Institute of Aging
T32AG51146-5

UChicago Information

Division(s)
Biological Sciences Division, Pritzker School of Medicine
Department(s)
Human Genetics, Medicine, Public Health Sciences
Center(s) or Institute(s)
Center for Research Informatics