Data-driven guidelines for phylogenomic analyses using SNP data

被引:0
|
作者
Suissa, Jacob S. [1 ]
de la Cerda, Gisel Y. [2 ]
Graber, Leland C. [3 ]
Jelley, Chloe [3 ]
Wickell, David [2 ,4 ]
Phillips, Heather R. [2 ]
Grinage, Ayress D. [2 ,5 ]
Moreau, Corrie S. [3 ,5 ]
Specht, Chelsea D. [2 ]
Doyle, Jeff J. [2 ]
Landis, Jacob B. [2 ,6 ]
机构
[1] Univ Tennessee Knoxville, Dept Ecol & Evolutionary Biol, Knoxville, TN 37996 USA
[2] Cornell Univ, Sch Integrat Plant Sci, Sect Plant Biol & L H Bailey Hortorium, Ithaca, NY USA
[3] Cornell Univ, Dept Entomol, Ithaca, NY 14853 USA
[4] Boyce Thompson Inst Plant Res, Ithaca, NY 14853 USA
[5] Cornell Univ, Dept Ecol & Evolutionary Biol, Ithaca, NY USA
[6] Boyce Thompson Inst Plant Res, BTI Computat Biol Ctr, Ithaca, NY USA
基金
美国食品与农业研究所; 美国国家科学基金会;
关键词
ancestral state reconstructions; divergence time estimation; genotyping-by-sequencing (GBS); Glycine; locus; phylogenetic comparative methods; single-nucleotide polymorphism (SNP) filtering; MISSING DATA; GLYCINE; TREE; IMPACT; LIKELIHOOD; EVOLUTION; SEQUENCE; TAXA; BIAS;
D O I
10.1002/aps3.11611
中图分类号
Q94 [植物学];
学科分类号
071001 ;
摘要
Premise: There is a general lack of consensus on the best practices for filtering of single-nucleotide polymorphisms (SNPs) and whether it is better to use SNPs or include flanking regions (full "locus") in phylogenomic analyses and subsequent comparative methods. Methods: Using genotyping-by-sequencing data from 22 Glycine species, we assessed the effects of SNP vs. locus usage and SNP retention stringency. We compared branch length, node support, and divergence time estimation across 16 datasets with varying amounts of missing data and total size. Results: Our results revealed five aspects of phylogenomic data usage that may be generally applicable: (1) tree topology is largely congruent across analyses; (2) filtering strictly for SNP retention (e.g., 90-100%) reduces support and can alter some inferred relationships; (3) absolute branch lengths vary by two orders of magnitude between SNP and locus datasets; (4) data type and branch length variation have little effect on divergence time estimation; and (5) phylograms alter the estimation of ancestral states and rates of morphological evolution. Discussion: Using SNP or locus datasets does not alter phylogenetic inference significantly, unless researchers want or need to use absolute branch lengths. We recommend against using excessive filtering thresholds for SNP retention to reduce the risk of producing inconsistent topologies and generating low support.
引用
下载
收藏
页数:17
相关论文
共 50 条
  • [1] A phylogenomic data-driven exploration of viral origins and evolution
    Nasir, Arshan
    Caetano-Anolles, Gustavo
    SCIENCE ADVANCES, 2015, 1 (08):
  • [2] The Data-Driven Approach to Spectroscopic Analyses
    Ness, M.
    PUBLICATIONS OF THE ASTRONOMICAL SOCIETY OF AUSTRALIA, 2018, 35
  • [3] The impact of compression on data-driven process analyses
    Thornhill, NF
    Choudhury, MAAS
    Shah, SL
    JOURNAL OF PROCESS CONTROL, 2004, 14 (04) : 389 - 398
  • [4] Advances in data-driven analyses and modelling using EPR-MOGA
    Giustolisi, O.
    Savic, D. A.
    JOURNAL OF HYDROINFORMATICS, 2009, 11 (3-4) : 225 - 236
  • [5] Data-Driven Analyses of Low Salinity Waterflooding in Carbonates
    Salimova, Rashida
    Pourafshary, Peyman
    Wang, Lei
    APPLIED SCIENCES-BASEL, 2021, 11 (14):
  • [6] Data-driven Multisubject Neuroimaging Analyses for Naturalistic Stimuli
    Biessmann, Felix
    Gaebler, Michael
    Lamke, Jan-Peter
    Ju, Ui Jong
    Hetzer, Stefan
    Wallraven, Christian
    Mueller, Klaus-Robert
    2014 INTERNATIONAL WORKSHOP ON PATTERN RECOGNITION IN NEUROIMAGING, 2014,
  • [7] DATA-DRIVEN
    Lev-Ram, Michal
    FORTUNE, 2016, 174 (05) : 76 - 81
  • [8] Data-driven control by using data-driven prediction and LASSO for FIR typed inverse controller
    Suzuki, Motoya
    Kaneko, Osamu
    ELECTRONICS AND COMMUNICATIONS IN JAPAN, 2023, 106 (03)
  • [9] Data-Driven Control by using Data-Driven Prediction and LASSO for FIR Typed Inverse Controller
    Suzuki M.
    Kaneko O.
    IEEJ Transactions on Electronics, Information and Systems, 2023, 143 (03) : 266 - 275
  • [10] The bootstrap: A technique for data-driven statistics. Using computer-intensive analyses to explore experimental data
    Henderson, AR
    CLINICA CHIMICA ACTA, 2005, 359 (1-2) : 1 - 26