ploidyfrost: Reference-free estimation of ploidy level from whole genome sequencing data based on de Bruijn graphs

被引:5
|
作者
Sun, Mingzhu
Pang, Erli
Bai, Wei-Ning
Zhang, Da-Yong
Lin, Kui [1 ,2 ]
机构
[1] Beijing Normal Univ, State Key Lab Earth Surface Proc & Resource Ecol, Beijing, Peoples R China
[2] Beijing Normal Univ, Minist Educ, Key Lab Biodivers Sci & Ecol Engn, Coll Life Sci, Beijing, Peoples R China
基金
国家重点研发计划;
关键词
de Bruijn graph; ploidy estimation; polyploidy; whole genome sequencing; POLYPLOIDY; PLANTS; ACID;
D O I
10.1111/1755-0998.13720
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Polyploidy is ubiquitous and its consequences are complex and variable. A change of ploidy level generally influences genetic diversity and results in morphological, physiological and ecological differences between cells or organisms with different ploidy levels. To avoid cumbersome experiments and take advantage of the less biased information provided by the vast amounts of genome sequencing data, computational tools for ploidy estimation are urgently needed. Until now, although a few such tools have been developed, many aspects of this estimation, such as the requirement of a reference genome, the lack of informative results and objective inferences, and the influence of false positives from errors and repeats, need further improvement. We have developed ploidyfrost, a de Bruijn graph-based method, to estimate ploidy levels from whole genome sequencing data sets without a reference genome. ploidyfrost provides a visual representation of allele frequency distribution generated using the ggplot2 package as well as quantitative results using the Gaussian mixture model. In addition, it takes advantage of colouring information encoded in coloured de Bruijn graphs to analyse multiple samples simultaneously and to flexibly filter putative false positives. We evaluated the performance of ploidyfrost by analysing highly heterozygous or repetitive samples of Cyclocarya paliurus and a complex allooctoploid sample of Fragaria x ananassa. Moreover, we demonstrated that the accuracy of analysis results can be improved by constraining a threshold such as Cramer's V coefficient on variant features, which may significantly reduce the side effects of sequencing errors and annoying repeats on the graphical structure constructed.
引用
收藏
页码:499 / 510
页数:12
相关论文
共 48 条
  • [1] Reference-free compression of high throughput sequencing data with a probabilistic de Bruijn graph
    Gaëtan Benoit
    Claire Lemaitre
    Dominique Lavenier
    Erwan Drezen
    Thibault Dayris
    Raluca Uricaru
    Guillaume Rizk
    BMC Bioinformatics, 16
  • [2] Reference-free compression of high throughput sequencing data with a probabilistic de Bruijn graph
    Benoit, Gaetan
    Lemaitre, Claire
    Lavenier, Dominique
    Drezen, Erwan
    Dayris, Thibault
    Uricaru, Raluca
    Rizk, Guillaume
    BMC BIOINFORMATICS, 2015, 16
  • [3] Alignment- and reference-free phylogenomics with colored de Bruijn graphs
    Roland Wittler
    Algorithms for Molecular Biology, 15
  • [4] Reference-free comparison of microbial communities via de Bruijn graphs
    Mangul, Serghei
    Koslicki, David
    PROCEEDINGS OF THE 7TH ACM INTERNATIONAL CONFERENCE ON BIOINFORMATICS, COMPUTATIONAL BIOLOGY, AND HEALTH INFORMATICS, 2016, : 68 - 77
  • [5] Alignment- and reference-free phylogenomics with colored de Bruijn graphs
    Wittler, Roland
    ALGORITHMS FOR MOLECULAR BIOLOGY, 2020, 15 (01)
  • [6] Reference-free phylogeny from sequencing data
    Rysavy, Petr
    Zelezny, Filip
    BIODATA MINING, 2023, 16 (01)
  • [7] Reference-free phylogeny from sequencing data
    Petr Ryšavý
    Filip Železný
    BioData Mining, 16
  • [8] BdBG: a bucket-based method for compressing genome sequencing data with dynamic de Bruijn graphs
    Wang, Rongjie
    Li, Junyi
    Bai, Yang
    Zang, Tianyi
    Wang, Yadong
    PEERJ, 2018, 6
  • [9] Large-scale reference-free analysis of flavivirus sequences in Aedes aegypti whole genome DNA sequencing data
    Anton Spadar
    Jody E. Phelan
    Taane G. Clark
    Susana Campino
    Parasites & Vectors, 16
  • [10] Large-scale reference-free analysis of flavivirus sequences in Aedes aegypti whole genome DNA sequencing data
    Spadar, Anton
    Phelan, Jody E. E.
    Clark, Taane G. G.
    Campino, Susana
    PARASITES & VECTORS, 2023, 16 (01)