distAngsd: Fast and Accurate Inference of Genetic Distances for Next-Generation Sequencing Data

被引:1
|
作者
Zhao, Lei [1 ]
Nielsen, Rasmus [1 ,2 ,3 ]
Korneliussen, Thorfinn Sand [1 ]
机构
[1] Univ Copenhagen, Globe Inst, Sect Geogenet, Oster Voldgade 5-7, DK-1350 Copenhagen K, Denmark
[2] Univ Calif Berkeley, Dept Integrat Biol, 3040 Valley Life Sci Bldg 3140, Berkeley, CA 94720 USA
[3] Univ Calif Berkeley, Dept Stat, 3040 Valley Life Sci Bldg 3140, Berkeley, CA 94720 USA
关键词
phylogeny reconstruction; genotype likelihood; genetic distance; high-throughput sequencing; next-generation sequencing; molecular evolution; maximum likelihood; expectation maximization; HAPLOTYPE RECONSTRUCTION; MAXIMUM-LIKELIHOOD; DNA; MITOCHONDRIAL; SITES; SUBSTITUTIONS; ASSOCIATION; FRAMEWORK; GENOTYPE; GENOMES;
D O I
10.1093/molbev/msac119
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Commonly used methods for inferring phylogenies were designed before the emergence of high-throughput sequencing and can generally not accommodate the challenges associated with noisy, diploid sequencing data. In many applications, diploid genomes are still treated as haploid through the use of ambiguity characters; while the uncertainty in genotype calling-arising as a consequence of the sequencing technology-is ignored. In order to address this problem, we describe two new probabilistic approaches for estimating genetic distances: distAngsd-geno and distAngsd-nuc, both implemented in a software suite named distAngsd. These methods are specifically designed for next-generation sequencing data, utilize the full information from the data, and take uncertainty in genotype calling into account. Through extensive simulations, we show that these new methods are markedly more accurate and have more stable statistical behaviors than other currently available methods for estimating genetic distances-even for very low depth data with high error rates.
引用
下载
收藏
页码:1084 / 1097
页数:14
相关论文
共 50 条
  • [1] Improving the estimation of genetic distances from Next-Generation Sequencing data
    Vieira, Filipe G.
    Lassalle, Florent
    Korneliussen, Thorfinn S.
    Fumagalli, Matteo
    BIOLOGICAL JOURNAL OF THE LINNEAN SOCIETY, 2016, 117 (01) : 139 - 149
  • [2] A fast and accurate SNP detection algorithm for next-generation sequencing data
    Xu, Feng
    Wang, Weixin
    Wang, Panwen
    Li, Mulin Jun
    Sham, Pak Chung
    Wang, Junwen
    NATURE COMMUNICATIONS, 2012, 3
  • [3] A fast and accurate SNP detection algorithm for next-generation sequencing data
    Feng Xu
    Weixin Wang
    Panwen Wang
    Mulin Jun Li
    Pak Chung Sham
    Junwen Wang
    Nature Communications, 3
  • [4] MapReduce for accurate error correction of next-generation sequencing data
    Zhao, Liang
    Chen, Qingfeng
    Li, Wencui
    Jiang, Peng
    Wong, Limsoon
    Li, Jinyan
    BIOINFORMATICS, 2017, 33 (23) : 3844 - 3851
  • [5] Discovering genetic polymorphisms in next-generation sequencing data
    Imelfort, Michael
    Duran, Chris
    Batley, Jacqueline
    Edwards, David
    PLANT BIOTECHNOLOGY JOURNAL, 2009, 7 (04) : 312 - 317
  • [6] A Distributed System for Fast Alignment of Next-Generation Sequencing Data
    Srimani, Jaydeep K.
    Wu, Po-Yen
    Phan, John H.
    Wang, May D.
    2010 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE WORKSHOPS (BIBMW), 2010, : 579 - 584
  • [7] NGSNGS: next-generation simulator for next-generation sequencing data
    Henriksen, Rasmus Amund
    Zhao, Lei
    Korneliussen, Thorfinn Sand
    BIOINFORMATICS, 2023, 39 (01)
  • [8] Next-generation sequencing in genetic diagnostics
    Biskup, Saskia
    LABORATORIUMSMEDIZIN-JOURNAL OF LABORATORY MEDICINE, 2010, 34 (06): : 305 - 309
  • [9] Indexing Next-Generation Sequencing data
    Jalili, Vahid
    Matteucci, Matteo
    Masseroli, Marco
    Ceri, Stefano
    INFORMATION SCIENCES, 2017, 384 : 90 - 109
  • [10] Robust inference of population structure from next-generation sequencing data with systematic differences in sequencing
    Liao, Peizhou
    Satten, Glen A.
    Hu, Yi-Juan
    BIOINFORMATICS, 2018, 34 (07) : 1157 - 1163