Benchmarking bioinformatic virus identification tools using real-world metagenomic data across biomes

被引:0
|
作者
Wu, Ling-Yi [1 ]
Wijesekara, Yasas [2 ]
Piedade, Goncalo J. [3 ,4 ]
Pappas, Nikolaos [1 ]
Brussaard, Corina P. D. [3 ,4 ]
Dutilh, Bas E. [1 ,5 ]
机构
[1] Univ Utrecht, Theoret Biol & Bioinformat, Science4Life, Padualaan 8, NL-3584 CH Utrecht, Netherlands
[2] Univ Med Greifswald, Inst Bioinformat, Felix Hausdorff Str 8, D-17475 Greifswald, Germany
[3] NIOZ Royal Netherlands Inst Sea Res, Dept Marine Microbiol & Biogeochem, POB 59, NL-1790 AB Den Burg, Texel, Netherlands
[4] Univ Amsterdam, Inst Biodivers & Ecosyst Dynam, Amsterdam, Netherlands
[5] Friedrich Schiller Univ Jena, Inst Biodivers, Fac Biol Sci, Cluster Excellence Balance Microverse, D-07743 Jena, Germany
来源
GENOME BIOLOGY | 2024年 / 25卷 / 01期
关键词
MARINE VIRUSES; ALIGNMENT; SAMPLES; REVEAL; GENES; PHAGE;
D O I
10.1186/s13059-024-03236-4
中图分类号
Q81 [生物工程学(生物技术)]; Q93 [微生物学];
学科分类号
071005 ; 0836 ; 090102 ; 100705 ;
摘要
Background As most viruses remain uncultivated, metagenomics is currently the main method for virus discovery. Detecting viruses in metagenomic data is not trivial. In the past few years, many bioinformatic virus identification tools have been developed for this task, making it challenging to choose the right tools, parameters, and cutoffs. As all these tools measure different biological signals, and use different algorithms and training and reference databases, it is imperative to conduct an independent benchmarking to give users objective guidance.Results We compare the performance of nine state-of-the-art virus identification tools in thirteen modes on eight paired viral and microbial datasets from three distinct biomes, including a new complex dataset from Antarctic coastal waters. The tools have highly variable true positive rates (0-97%) and false positive rates (0-30%). PPR-Meta best distinguishes viral from microbial contigs, followed by DeepVirFinder, VirSorter2, and VIBRANT. Different tools identify different subsets of the benchmarking data and all tools, except for Sourmash, find unique viral contigs. Performance of tools improved with adjusted parameter cutoffs, indicating that adjustment of parameter cutoffs before usage should be considered.Conclusions Together, our independent benchmarking facilitates selecting choices of bioinformatic virus identification tools and gives suggestions for parameter adjustments to viromics researchers.
引用
收藏
页数:23
相关论文
共 50 条
  • [1] Beyond data sharing: Using real-world data for teaching real-world computational workflows and for benchmarking new methods
    Jansen, Johanna
    Amaro, Rommie
    Tseng, Y. Jane
    Cornell, Wendy
    Esposito, Emilio
    Walters, Pat
    [J]. ABSTRACTS OF PAPERS OF THE AMERICAN CHEMICAL SOCIETY, 2016, 252
  • [2] Gauge your phage: benchmarking of bacteriophage identification tools in metagenomic sequencing data
    Siu Fung Stanley Ho
    Nicole E. Wheeler
    Andrew D. Millard
    Willem van Schaik
    [J]. Microbiome, 11
  • [3] Gauge your phage: benchmarking of bacteriophage identification tools in metagenomic sequencing data
    Ho, Siu Fung Stanley
    Wheeler, Nicole E.
    Millard, Andrew D.
    van Schaik, Willem
    [J]. MICROBIOME, 2023, 11 (01)
  • [4] Harnessing of real-world data and real-world evidence using digital tools: utility and potential models in rheumatology practice
    Kataria, Suchitra
    Ravindran, Vinod
    [J]. RHEUMATOLOGY, 2022, 61 (02) : 502 - 513
  • [5] Challenges in benchmarking stream learning algorithms with real-world data
    Souza, Vinicius M. A.
    dos Reis, Denis M.
    Maletzke, Andre G.
    Batista, Gustavo E. A. P. A.
    [J]. DATA MINING AND KNOWLEDGE DISCOVERY, 2020, 34 (06) : 1805 - 1858
  • [6] Challenges in benchmarking stream learning algorithms with real-world data
    Vinicius M. A. Souza
    Denis M. dos Reis
    André G. Maletzke
    Gustavo E. A. P. A. Batista
    [J]. Data Mining and Knowledge Discovery, 2020, 34 : 1805 - 1858
  • [7] Battery Identification Based on Real-World Data
    Zhang, Miao
    Miao, Zhixin
    Fan, Lingling
    [J]. 2017 NORTH AMERICAN POWER SYMPOSIUM (NAPS), 2017,
  • [8] Identification of pregnancies and pregnancy outcomes using real-world healthcare data
    Weil, Clara
    Rotem, Ran
    Sinha, Anushua
    Chodick, Gabriel
    Wang, Wei
    Calhoun, Shawna
    Bilavsky, Efraim
    Marks, Morgan A.
    [J]. PHARMACOEPIDEMIOLOGY AND DRUG SAFETY, 2020, 29 : 529 - 530
  • [9] Drug repurposing using real-world data
    Tan, George S. Q.
    Sloan, Erica K.
    Lambert, Pete
    Kirkpatrick, Carl M. J.
    Ilomaki, Jenni
    [J]. DRUG DISCOVERY TODAY, 2023, 28 (01) : 10 - 13
  • [10] Using real-world data for coverage and payment decisions: The ISPOR real-world data task force report
    Garrison, Louis P., Jr.
    Neumann, Peter J.
    Erickson, Pennifer
    Marshall, Deborah
    Mullins, Daniel
    [J]. VALUE IN HEALTH, 2007, 10 (05) : 326 - 335