ploidyfrost: Reference-free estimation of ploidy level from whole genome sequencing data based on de Bruijn graphs

被引:5
|
作者
Sun, Mingzhu
Pang, Erli
Bai, Wei-Ning
Zhang, Da-Yong
Lin, Kui [1 ,2 ]
机构
[1] Beijing Normal Univ, State Key Lab Earth Surface Proc & Resource Ecol, Beijing, Peoples R China
[2] Beijing Normal Univ, Minist Educ, Key Lab Biodivers Sci & Ecol Engn, Coll Life Sci, Beijing, Peoples R China
基金
国家重点研发计划;
关键词
de Bruijn graph; ploidy estimation; polyploidy; whole genome sequencing; POLYPLOIDY; PLANTS; ACID;
D O I
10.1111/1755-0998.13720
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Polyploidy is ubiquitous and its consequences are complex and variable. A change of ploidy level generally influences genetic diversity and results in morphological, physiological and ecological differences between cells or organisms with different ploidy levels. To avoid cumbersome experiments and take advantage of the less biased information provided by the vast amounts of genome sequencing data, computational tools for ploidy estimation are urgently needed. Until now, although a few such tools have been developed, many aspects of this estimation, such as the requirement of a reference genome, the lack of informative results and objective inferences, and the influence of false positives from errors and repeats, need further improvement. We have developed ploidyfrost, a de Bruijn graph-based method, to estimate ploidy levels from whole genome sequencing data sets without a reference genome. ploidyfrost provides a visual representation of allele frequency distribution generated using the ggplot2 package as well as quantitative results using the Gaussian mixture model. In addition, it takes advantage of colouring information encoded in coloured de Bruijn graphs to analyse multiple samples simultaneously and to flexibly filter putative false positives. We evaluated the performance of ploidyfrost by analysing highly heterozygous or repetitive samples of Cyclocarya paliurus and a complex allooctoploid sample of Fragaria x ananassa. Moreover, we demonstrated that the accuracy of analysis results can be improved by constraining a threshold such as Cramer's V coefficient on variant features, which may significantly reduce the side effects of sequencing errors and annoying repeats on the graphical structure constructed.
引用
收藏
页码:499 / 510
页数:12
相关论文
共 48 条
  • [21] Using reference-free compressed data structures to analyze sequencing reads from thousands of human genomes
    Dolle, Dirk D.
    Liu, Zhicheng
    Cotten, Matthew
    Simpson, Jared T.
    Iqbal, Zamin
    Durbin, Richard
    McCarthy, Shane A.
    Keane, Thomas M.
    GENOME RESEARCH, 2017, 27 (02) : 300 - 309
  • [22] Reference-Free Displacement Estimation of Bridges Using Kalman Filter-Based Multimetric Data Fusion
    Cho, Soojin
    Park, Jong-Woong
    Palanisamy, Rajendra P.
    Sim, Sung-Han
    JOURNAL OF SENSORS, 2016, 2016
  • [23] Reference-free transcriptome assembly in non-model animals from next-generation sequencing data
    Cahais, V.
    Gayral, P.
    Tsagkogeorga, G.
    Melo-Ferreira, J.
    Ballenghien, M.
    Weinert, L.
    Chiari, Y.
    Belkhir, K.
    Ranwez, V.
    Galtier, N.
    MOLECULAR ECOLOGY RESOURCES, 2012, 12 (05) : 834 - 845
  • [24] Publisher Correction: Telomerecat: A ploidy-agnostic method for estimating telomere length from whole genome sequencing data
    James H. R. Farmery
    Mike L. Smith
    Andy G. Lynch
    Scientific Reports, 8
  • [25] Norgal: extraction and de novo assembly of mitochondrial DNA from whole-genome sequencing data
    Al-Nakeeb, Kosai
    Petersen, Thomas Nordahl
    Sicheritz-Ponten, Thomas
    BMC BIOINFORMATICS, 2017, 18
  • [26] Norgal: extraction and de novo assembly of mitochondrial DNA from whole-genome sequencing data
    Kosai Al-Nakeeb
    Thomas Nordahl Petersen
    Thomas Sicheritz-Pontén
    BMC Bioinformatics, 18
  • [27] Telomerecat: A ploidy-agnostic method for estimating telomere length from whole genome sequencing data (vol 8, 1300, 2018)
    Farmery, James H. R.
    Smith, Mike L.
    Lynch, Andy G.
    Huissoon, Aarnoud
    Furnell, Abigail
    Mead, Adam
    Levine, Adam P.
    Manzur, Adnan
    Thrasher, Adrian
    Greenhalgh, Alan
    Parker, Alasdair
    Sanchis-Juan, Alba
    Richter, Alex
    Gardham, Alice
    Lawrie, Allan
    Sohal, Aman
    Creaser-Myers, Amanda
    Frary, Amy
    Greinacher, Andreas
    Themistocleous, Andreas
    Peacock, Andrew J.
    Marshall, Andrew
    Mumford, Andrew
    Rice, Andrew
    Webster, Andrew
    Brady, Angie
    Koziell, Ania
    Manson, Ania
    Chandra, Anita
    Hensiek, Anke
    in't Veld, Anna Huis
    Maw, Anna
    Kelly, Anne M.
    Moore, Anthony
    Noordegraaf, Anton Vonk
    Attwood, Antony
    Herwadkar, Archana
    Ghofrani, Ardi
    Houweling, Arjan C.
    Girerd, Barbara
    Furie, Bruce
    Treacy, Carmen M.
    Millar, Carolyn M.
    Sewell, Carrock
    Roughley, Catherine
    Titterton, Catherine
    Williamson, Catherine
    Hadinnapola, Charaka
    Deshpande, Charu
    Toh, Cheng-Hock
    SCIENTIFIC REPORTS, 2018, 8
  • [28] methylGrapher: genome-graph-based processing of DNA methylation data from whole genome bisulfite sequencing
    Zhang, Wenjin
    Macias-Velasco, Juan F.
    Zhuo, Xiaoyu
    Belter Jr, Edward A.
    Tomlinson, Chad
    Garza, John
    Tekkey, Nina
    Li, Daofeng
    Wang, Ting
    NUCLEIC ACIDS RESEARCH, 2025, 53 (03)
  • [29] Reference-free inference of tumor phylogenies from single-cell sequencing data (vol 16, S7, 2015)
    Subramanian, Ayshwarya
    Schwartz, Russell
    BMC GENOMICS, 2016, 17
  • [30] ACE: absolute copy number estimation from low-coverage whole-genome sequencing data
    Poell, Jos B.
    Mendeville, Matias
    Sie, Daoud
    Brink, Arjen
    Brakenhoff, Ruud H.
    Ylstra, Bauke
    BIOINFORMATICS, 2019, 35 (16) : 2847 - 2849