Data-driven guidelines for phylogenomic analyses using SNP data

被引:0
|
作者
Suissa, Jacob S. [1 ]
de la Cerda, Gisel Y. [2 ]
Graber, Leland C. [3 ]
Jelley, Chloe [3 ]
Wickell, David [2 ,4 ]
Phillips, Heather R. [2 ]
Grinage, Ayress D. [2 ,5 ]
Moreau, Corrie S. [3 ,5 ]
Specht, Chelsea D. [2 ]
Doyle, Jeff J. [2 ]
Landis, Jacob B. [2 ,6 ]
机构
[1] Univ Tennessee Knoxville, Dept Ecol & Evolutionary Biol, Knoxville, TN 37996 USA
[2] Cornell Univ, Sch Integrat Plant Sci, Sect Plant Biol & L H Bailey Hortorium, Ithaca, NY USA
[3] Cornell Univ, Dept Entomol, Ithaca, NY 14853 USA
[4] Boyce Thompson Inst Plant Res, Ithaca, NY 14853 USA
[5] Cornell Univ, Dept Ecol & Evolutionary Biol, Ithaca, NY USA
[6] Boyce Thompson Inst Plant Res, BTI Computat Biol Ctr, Ithaca, NY USA
基金
美国食品与农业研究所; 美国国家科学基金会;
关键词
ancestral state reconstructions; divergence time estimation; genotyping-by-sequencing (GBS); Glycine; locus; phylogenetic comparative methods; single-nucleotide polymorphism (SNP) filtering; MISSING DATA; GLYCINE; TREE; IMPACT; LIKELIHOOD; EVOLUTION; SEQUENCE; TAXA; BIAS;
D O I
10.1002/aps3.11611
中图分类号
Q94 [植物学];
学科分类号
071001 ;
摘要
Premise: There is a general lack of consensus on the best practices for filtering of single-nucleotide polymorphisms (SNPs) and whether it is better to use SNPs or include flanking regions (full "locus") in phylogenomic analyses and subsequent comparative methods. Methods: Using genotyping-by-sequencing data from 22 Glycine species, we assessed the effects of SNP vs. locus usage and SNP retention stringency. We compared branch length, node support, and divergence time estimation across 16 datasets with varying amounts of missing data and total size. Results: Our results revealed five aspects of phylogenomic data usage that may be generally applicable: (1) tree topology is largely congruent across analyses; (2) filtering strictly for SNP retention (e.g., 90-100%) reduces support and can alter some inferred relationships; (3) absolute branch lengths vary by two orders of magnitude between SNP and locus datasets; (4) data type and branch length variation have little effect on divergence time estimation; and (5) phylograms alter the estimation of ancestral states and rates of morphological evolution. Discussion: Using SNP or locus datasets does not alter phylogenetic inference significantly, unless researchers want or need to use absolute branch lengths. We recommend against using excessive filtering thresholds for SNP retention to reduce the risk of producing inconsistent topologies and generating low support.
引用
下载
收藏
页数:17
相关论文
共 50 条
  • [21] Data-driven analyses of low salinity water flooding in sandstones
    Wang, Lei
    Fu, Xuebing
    FUEL, 2018, 234 : 674 - 686
  • [22] Overlay databank unlocks data-driven analyses of biomolecules for all
    Anne M. Kiirikki
    Hanne S. Antila
    Lara S. Bort
    Pavel Buslaev
    Fernando Favela-Rosales
    Tiago Mendes Ferreira
    Patrick F. J. Fuchs
    Rebeca Garcia-Fandino
    Ivan Gushchin
    Batuhan Kav
    Norbert Kučerka
    Patrik Kula
    Milla Kurki
    Alexander Kuzmin
    Anusha Lalitha
    Fabio Lolicato
    Jesper J. Madsen
    Markus S. Miettinen
    Cedric Mingham
    Luca Monticelli
    Ricky Nencini
    Alexey M. Nesterenko
    Thomas J. Piggot
    Ángel Piñeiro
    Nathalie Reuter
    Suman Samantray
    Fabián Suárez-Lestón
    Reza Talandashti
    O. H. Samuli Ollila
    Nature Communications, 15
  • [23] Data-Driven Synthesis of Provably Sound Side Channel Analyses
    Wang, Jingbo
    Sung, Chungha
    Raghothaman, Mukund
    Wang, Chao
    2021 IEEE/ACM 43RD INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING (ICSE 2021), 2021, : 810 - 822
  • [24] A new vision for travel medicine: using mHealth and data-driven analyses to drive innovation
    Farnham, A.
    Furrer, R.
    Blanke, U.
    Stone, E.
    Puhan, M. A.
    Hatz, C.
    TROPICAL MEDICINE & INTERNATIONAL HEALTH, 2017, 22 : 82 - 82
  • [25] A data-driven paradigm to develop and tune data-driven realtime system
    Wabiko, Y
    Nishikawa, H
    PDPTA'2001: PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED PROCESSING TECHNIQUES AND APPLICATIONS, 2001, : 350 - 356
  • [26] Data-Driven Mapping Using Local Patterns
    Mehta, Gayatri
    Patel, Krunal Kumar
    Parde, Natalie
    Pollard, Nancy S.
    IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, 2013, 32 (11) : 1668 - 1681
  • [27] A data-driven micropipeline structure using DSDCVSL
    Mathew, S
    Sridhar, R
    PROCEEDINGS OF THE IEEE 1999 CUSTOM INTEGRATED CIRCUITS CONFERENCE, 1999, : 295 - 298
  • [28] Using Simulation to Evaluate Data-Driven Agents
    Sklar, Elizabeth
    Icke, Ilknur
    MULTI-AGENT-BASED SIMULATION IX, 2009, 5269 : 71 - +
  • [29] CONTROLLER SYNTHESIS USING DATA-DRIVEN CLOCKS
    AGHDASI, F
    MICROELECTRONICS JOURNAL, 1995, 26 (05) : 449 - 461
  • [30] Data-driven micropipeline structure using DSDCVSL
    Mathew, Sanu
    Sridhar, Ramalingam
    Proceedings of the Custom Integrated Circuits Conference, 1999, : 295 - 298