ploidyfrost: Reference-free estimation of ploidy level from whole genome sequencing data based on de Bruijn graphs

被引:5
|
作者
Sun, Mingzhu
Pang, Erli
Bai, Wei-Ning
Zhang, Da-Yong
Lin, Kui [1 ,2 ]
机构
[1] Beijing Normal Univ, State Key Lab Earth Surface Proc & Resource Ecol, Beijing, Peoples R China
[2] Beijing Normal Univ, Minist Educ, Key Lab Biodivers Sci & Ecol Engn, Coll Life Sci, Beijing, Peoples R China
基金
国家重点研发计划;
关键词
de Bruijn graph; ploidy estimation; polyploidy; whole genome sequencing; POLYPLOIDY; PLANTS; ACID;
D O I
10.1111/1755-0998.13720
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Polyploidy is ubiquitous and its consequences are complex and variable. A change of ploidy level generally influences genetic diversity and results in morphological, physiological and ecological differences between cells or organisms with different ploidy levels. To avoid cumbersome experiments and take advantage of the less biased information provided by the vast amounts of genome sequencing data, computational tools for ploidy estimation are urgently needed. Until now, although a few such tools have been developed, many aspects of this estimation, such as the requirement of a reference genome, the lack of informative results and objective inferences, and the influence of false positives from errors and repeats, need further improvement. We have developed ploidyfrost, a de Bruijn graph-based method, to estimate ploidy levels from whole genome sequencing data sets without a reference genome. ploidyfrost provides a visual representation of allele frequency distribution generated using the ggplot2 package as well as quantitative results using the Gaussian mixture model. In addition, it takes advantage of colouring information encoded in coloured de Bruijn graphs to analyse multiple samples simultaneously and to flexibly filter putative false positives. We evaluated the performance of ploidyfrost by analysing highly heterozygous or repetitive samples of Cyclocarya paliurus and a complex allooctoploid sample of Fragaria x ananassa. Moreover, we demonstrated that the accuracy of analysis results can be improved by constraining a threshold such as Cramer's V coefficient on variant features, which may significantly reduce the side effects of sequencing errors and annoying repeats on the graphical structure constructed.
引用
收藏
页码:499 / 510
页数:12
相关论文
共 48 条
  • [31] De novo whole genome sequencing data of two mangrove-isolated microalgae from Terengganu coastal waters
    Teh, Kit Yinn
    Afifudeen, C. L. Wan
    Aziz, Ahmad
    Wong, Li Lian
    Loh, Saw Hong
    Cha, Thye San
    DATA IN BRIEF, 2019, 27
  • [32] An efficient procedure for plant organellar genome assembly, based on whole genome data from the 454 GS FLX sequencing platform
    Zhang, Tongwu
    Zhang, Xiaowei
    Hu, Songnian
    Yu, Jun
    PLANT METHODS, 2011, 7
  • [33] An efficient procedure for plant organellar genome assembly, based on whole genome data from the 454 GS FLX sequencing platform
    Tongwu Zhang
    Xiaowei Zhang
    Songnian Hu
    Jun Yu
    Plant Methods, 7
  • [34] Genome-Wide Estimation of Linkage Disequilibrium from Population-Level High-Throughput Sequencing Data
    Maruki, Takahiro
    Lynch, Michael
    GENETICS, 2014, 197 (04) : 1303 - U421
  • [35] Population differentiation and epidemic tracking of Bursaphelenchus xylophilus in China based on chromosome-level assembly and whole-genome sequencing data
    Ding, Xiaolei
    Guo, Yunfei
    Ye, Jianren
    Wu, Xiaoqin
    Lin, Sixi
    Chen, Fengmao
    Zhu, Lihua
    Huang, Lin
    Song, Xiaofeng
    Zhang, Yi
    Dai, Ling
    Xi, Xiaotong
    Huang, Jinsi
    Wang, Kai
    Fan, Ben
    Li, De-Wei
    PEST MANAGEMENT SCIENCE, 2022, 78 (03) : 1213 - 1226
  • [36] De novo assembly of a chromosome-level reference genome of the ornamental butterfly Sericinus montelus based on nanopore sequencing and Hi-C analysis
    Li, Jingjing
    Wang, Haiyan
    Zhu, Jianqing
    Yang, Qi
    Luan, Yang
    Shi, Leming
    Molina-Mora, Jose Arturo
    Zheng, Yuanting
    FRONTIERS IN GENETICS, 2023, 14
  • [37] Imputation and de novo variant discovery from low-pass whole genome sequencing data for cost-effective and scalable trait mapping
    Pickrell, J.
    Berisa, T.
    Wasik, K.
    Fraser, D.
    Cox, C.
    EUROPEAN JOURNAL OF HUMAN GENETICS, 2019, 27 : 804 - 804
  • [38] Evaluation of parameters affecting performance and reliability of machine learning-based antibiotic susceptibility testing from whole genome sequencing data
    Hicks, Allison L.
    Wheeler, Nicole
    Sanchez-Buso, Leonor
    Rakeman, Jennifer L.
    Harris, Simon R.
    Grad, Yonatan H.
    PLOS COMPUTATIONAL BIOLOGY, 2019, 15 (09)
  • [39] Machine learning-based colistin resistance marker screening and phenotype prediction in Escherichia coli from whole genome sequencing data
    Tian, Yingxin
    Zhang, Di
    Chen, Fangyuan
    Rao, Guanhua
    Zhang, Ying
    JOURNAL OF INFECTION, 2024, 88 (02) : 191 - 193
  • [40] Krisp: A Python']Python package to aid in the design of CRISPR and amplification-based diagnostic assays from whole genome sequencing data
    Foster, Zachary S. L.
    Tupper, Andrew S.
    Press, Caroline M.
    Grunwald, Niklaus J.
    PLOS COMPUTATIONAL BIOLOGY, 2024, 20 (05)