nQuack: An R package for predicting ploidal level from sequence data using site-based heterozygosity

Gaynor, Michelle L.; Landis, Jacob B.; O'Connor, Timothy K.; Laport, Robert G.; Doyle, Jeff J.; Soltis, Douglas E.; Ponciano, José Miguel; Soltis, Pamela S.

doi:10.6082/8pjah-7mg02

Published July 14, 2024 | Version v1

Journal article Open

nQuack: An R package for predicting ploidal level from sequence data using site-based heterozygosity

1. University of Florida
2. Cornell University
3. University of Chicago
4. College of Idaho

Premise: Traditional methods of ploidal-level estimation are tedious; using DNA sequence data for cytotype estimation is an ideal alternative. Multiple statistical approaches to leverage sequence data for ploidy inference based on site-based heterozygosity have been developed. However, these approaches may require high-coverage sequence data, use inappropriate probability distributions, or have additional statistical shortcomings that limit inference abilities. We introduce nQuack, an open-source R package that addresses the main shortcomings of current methods.

Methods and Results: nQuack performs model selection for improved ploidy predictions. Here, we implement expectation maximization algorithms with normal, beta, and beta-binomial distributions. Using extensive computer simulations that account for variability in sequencing depth, as well as real data sets, we demonstrate the utility and limitations of nQuack.

Conclusions: Inferring ploidy based on site-based heterozygosity alone is difficult. Even though nQuack is more accurate than similar methods, we suggest caution when relying on any site-based heterozygosity method to infer ploidy.

Data availability

The R package nQuack is available at https://github.com/mgaynor1/nQuack and https://mlgaynor.com/nQuack/. A full implementation tutorial (https://mlgaynor.com/nQuack/articles/BasicExample.html), as well as detailed tutorials on data preparation (https://mlgaynor.com/nQuack/articles/DataPreparation.html) and model inference (https://mlgaynor.com/nQuack/articles/ModelOptions.html), are available with the package documentation. For three sample sets, reference genomes and population genetics data are available via open repositories (see Appendix S3 and S4 for accessions). Sequence data for Galax urceolata and Larrea tridentata will be published in open repositories with future publications. An exemplar data set and processing times required for every step of model implementation (1.46–2.09 s for models with the normal distribution; 6.41–23.16 min for models with the beta distribution; 9.54–46.15 min for models with beta-binomial distribution), as well as the output of each step of our method, are available on our GitHub (https://mlgaynor.com/nQuack/articles/BasicExample.html).

Files

nQuack.pdf

Files (4.1 MB)

Name	Size	Download all
nQuack.pdf Article md5:a64a84bc0c8bba5c58b5477c70e05087	932.4 kB	Preview Download
Supporting-information.zip Supporting information files md5:e3f2d806af3731d37748a49da721a21c	3.1 MB	Preview Download

Additional details

DOI: 10.1002/aps3.11606
Other: oai:uchicago.tind.io:12832

National Science Foundation
Graduate Research Fellowship
National Science Foundation
Small Grant
National Science Foundation
Plant Genome Fellowship
National Institute of Food and Agriculture, U.S. Department of Agriculture
Hatch award

Division(s): Biological Sciences Division
Department(s): Ecology and Evolution

	All versions	This version
Views	1	1
Downloads	0	0
Data volume	0 Bytes	0 Bytes

nQuack: An R package for predicting ploidal level from sequence data using site-based heterozygosity

Data availability

Files

nQuack.pdf

Files (4.1 MB)

Additional details

Identifiers

Funding

UChicago Information

nQuack: An R package for predicting ploidal level from sequence data using site-based heterozygosity

Creators

Description

Data availability

Files

nQuack.pdf

Files (4.1 MB)

Additional details

Identifiers

Funding

UChicago Information