ploidyfrost: Reference-free estimation of ploidy level from whole genome sequencing data based on de Bruijn graphs

被引:5
|
作者
Sun, Mingzhu
Pang, Erli
Bai, Wei-Ning
Zhang, Da-Yong
Lin, Kui [1 ,2 ]
机构
[1] Beijing Normal Univ, State Key Lab Earth Surface Proc & Resource Ecol, Beijing, Peoples R China
[2] Beijing Normal Univ, Minist Educ, Key Lab Biodivers Sci & Ecol Engn, Coll Life Sci, Beijing, Peoples R China
基金
国家重点研发计划;
关键词
de Bruijn graph; ploidy estimation; polyploidy; whole genome sequencing; POLYPLOIDY; PLANTS; ACID;
D O I
10.1111/1755-0998.13720
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Polyploidy is ubiquitous and its consequences are complex and variable. A change of ploidy level generally influences genetic diversity and results in morphological, physiological and ecological differences between cells or organisms with different ploidy levels. To avoid cumbersome experiments and take advantage of the less biased information provided by the vast amounts of genome sequencing data, computational tools for ploidy estimation are urgently needed. Until now, although a few such tools have been developed, many aspects of this estimation, such as the requirement of a reference genome, the lack of informative results and objective inferences, and the influence of false positives from errors and repeats, need further improvement. We have developed ploidyfrost, a de Bruijn graph-based method, to estimate ploidy levels from whole genome sequencing data sets without a reference genome. ploidyfrost provides a visual representation of allele frequency distribution generated using the ggplot2 package as well as quantitative results using the Gaussian mixture model. In addition, it takes advantage of colouring information encoded in coloured de Bruijn graphs to analyse multiple samples simultaneously and to flexibly filter putative false positives. We evaluated the performance of ploidyfrost by analysing highly heterozygous or repetitive samples of Cyclocarya paliurus and a complex allooctoploid sample of Fragaria x ananassa. Moreover, we demonstrated that the accuracy of analysis results can be improved by constraining a threshold such as Cramer's V coefficient on variant features, which may significantly reduce the side effects of sequencing errors and annoying repeats on the graphical structure constructed.
引用
收藏
页码:499 / 510
页数:12
相关论文
共 48 条
  • [41] Open chromatin region (OCR) based model predicts advanced adenoma in plasma cell-free DNA whole-genome bisulfite sequencing data
    Canal-Noguer, P.
    Chersicola, M.
    Kruusmaa, K.
    Bitenc, M.
    Perera-LLuna, A.
    ANNALS OF ONCOLOGY, 2020, 31 : S444 - S444
  • [42] Evaluation of Whole Genome Sequencing-Based Predictions of Antimicrobial Resistance to TB First Line Agents: A Lesson from 5 Years of Data
    Sharma, Meenu Kaushal
    Stobart, Michael
    Akochy, Pierre-Marie
    Adam, Heather
    Janella, Debra
    Rabb, Melissa
    Alawa, Mohey
    Sekirov, Inna
    Tyrrell, Gregory J.
    Soualhine, Hafid
    INTERNATIONAL JOURNAL OF MOLECULAR SCIENCES, 2024, 25 (11)
  • [43] A novel bioinformatic approach for the family-based genetic analysis of whole-genome sequencing data from 8 multigenerational Spanish families with bipolar disorder
    Fischer, S. B.
    Fink, M.
    Ng, C. K. Y.
    Reinbold, C. S.
    Maaser-Hecker, A.
    Streit, F.
    Witt, S. H.
    Guzman-Parra, J.
    Orozco-Diaz, G.
    Auburger, G.
    Albus, M.
    Borrmann-Hassenbach, M.
    Gonzalez, M. J.
    Gil-Flores, S.
    Cabaleiro Fabeiro, F. J.
    del Rio Noriega, F.
    Perez-Perez, F.
    Haro-Gonzalez, J.
    Rivas, F.
    Mayoral, F.
    Herms, S.
    Rietschel, M.
    Noethen, M. M.
    Hoffmann, P.
    Forstner, A. J.
    Cichon, S.
    EUROPEAN JOURNAL OF HUMAN GENETICS, 2020, 28 (SUPPL 1) : 423 - 424
  • [44] Fast and low-cost decentralized surveillance of transmission of tuberculosis based on strain-specific PCRs tailored from whole genome sequencing data: a pilot study
    Perez-Lago, L.
    Martinez Lirola, M.
    Herranz, M.
    Comas, I.
    Bouza, E.
    Garcia-de-Viedma, D.
    CLINICAL MICROBIOLOGY AND INFECTION, 2015, 21 (03) : 249.e1 - 249.e9
  • [45] Improved Imputation Accuracy of Rare and Low-Frequency Genetic Variants Using Population-Specific High-Coverage Whole-Genome Sequencing Data Based Imputation Reference Panel
    Mitt, Mario
    Kals, Mart
    Parn, Kalle
    Gabriel, Stacey B.
    Lander, Eric S.
    Palotie, Aarno
    Ripatti, Samuli
    Morris, Andrew P.
    Metspalu, Andres
    Esko, Tonu
    Magi, Reedik
    Palta, Priit
    HUMAN HEREDITY, 2016, 81 (04) : 235 - 235
  • [46] PCR-Free Shallow Whole Genome Sequencing for Chromosomal Copy Number Detection from Plasma of Cancer Patients Is an Efficient Alternative to the Conventional PCR-Based Approach
    Beagan, Jamie J.
    Drees, Esther E. E.
    Stathi, Phylicia
    Eijk, Paul P.
    Meulenbroeks, Laura
    Kessler, Floortje
    Middeldorp, Jaap M.
    Pegtel, D. Michiel
    Zijlstra, Josee M.
    Sie, Daoud
    Heideman, Danielle A. M.
    Thunnissen, Erik
    Smit, Linda
    de Jong, Daphne
    Mouliere, Florent
    Ylstra, Bauke
    Roemer, Margaretha G. M.
    van Dijk, Erik
    JOURNAL OF MOLECULAR DIAGNOSTICS, 2021, 23 (11): : 1553 - 1563
  • [47] Whole-genome sequencing for antimicrobial surveillance: species-specific quality thresholds and data evaluation from the network of the European Union Reference Laboratory for Antimicrobial Resistance genomic proficiency tests of 2021 and 2022
    Sorensen, Lauge Holm
    Pedersen, Susanne Karlsmose
    Jensen, Jacob Dyring
    Lacy-Roberts, Niamh
    Andrea, Athina
    Brouwer, Michael S. M.
    Veldman, Kees T.
    Lou, Yan
    Hoffmann, Maria
    Hendriksen, Rene S.
    MSYSTEMS, 2024, 9 (09)
  • [48] Identifying patient-specific neoepitopes for cell-based and vaccine immunotherapy targets in breast cancer patients by HLA typing and predicting MHC presentation from whole genome and RNA sequencing data.
    Nguyen, Andrew
    Sanborn, John Z.
    Vaske, Charles Joseph
    Rabizadeh, Shahrooz
    Niazi, Kayvan
    Soon-Shiong, Patrick
    Benz, Stephen Charles
    JOURNAL OF CLINICAL ONCOLOGY, 2016, 34 (15)