nQuack: An R package for predicting ploidal level from sequence data using site-based heterozygosity
Creators
- 1. University of Florida
- 2. Cornell University
- 3. University of Chicago
- 4. College of Idaho
Description
Premise: Traditional methods of ploidal-level estimation are tedious; using DNA sequence data for cytotype estimation is an ideal alternative. Multiple statistical approaches to leverage sequence data for ploidy inference based on site-based heterozygosity have been developed. However, these approaches may require high-coverage sequence data, use inappropriate probability distributions, or have additional statistical shortcomings that limit inference abilities. We introduce nQuack, an open-source R package that addresses the main shortcomings of current methods.
Methods and Results: nQuack performs model selection for improved ploidy predictions. Here, we implement expectation maximization algorithms with normal, beta, and beta-binomial distributions. Using extensive computer simulations that account for variability in sequencing depth, as well as real data sets, we demonstrate the utility and limitations of nQuack.
Conclusions: Inferring ploidy based on site-based heterozygosity alone is difficult. Even though nQuack is more accurate than similar methods, we suggest caution when relying on any site-based heterozygosity method to infer ploidy.
Data availability
The R package nQuack is available at https://github.com/mgaynor1/nQuack and https://mlgaynor.com/nQuack/. A full implementation tutorial (https://mlgaynor.com/nQuack/articles/BasicExample.html), as well as detailed tutorials on data preparation (https://mlgaynor.com/nQuack/articles/DataPreparation.html) and model inference (https://mlgaynor.com/nQuack/articles/ModelOptions.html), are available with the package documentation. For three sample sets, reference genomes and population genetics data are available via open repositories (see Appendix S3 and S4 for accessions). Sequence data for Galax urceolata and Larrea tridentata will be published in open repositories with future publications. An exemplar data set and processing times required for every step of model implementation (1.46–2.09 s for models with the normal distribution; 6.41–23.16 min for models with the beta distribution; 9.54–46.15 min for models with beta-binomial distribution), as well as the output of each step of our method, are available on our GitHub (https://mlgaynor.com/nQuack/articles/BasicExample.html).Files
nQuack.pdf
Files
(4.1 MB)
| Name | Size | Download all |
|---|---|---|
|
Article md5:a64a84bc0c8bba5c58b5477c70e05087 |
932.4 kB | Preview Download |
|
Supporting information files md5:e3f2d806af3731d37748a49da721a21c |
3.1 MB | Preview Download |
Additional details
Identifiers
- DOI
- 10.1002/aps3.11606
- Other
- oai:uchicago.tind.io:12832
Funding
- National Science Foundation
- Graduate Research Fellowship
- National Science Foundation
- Small Grant
- National Science Foundation
- Plant Genome Fellowship
- National Institute of Food and Agriculture, U.S. Department of Agriculture
- Hatch award