SimBA: A methodology and tools for evaluating the performance of RNA-Seq bioinformatic pipelines

被引:6
|
作者
Audoux, Jerome [1 ,2 ]
Salson, Mikael [3 ]
Grosset, Christophe F. [4 ]
Beaumeunier, Sacha [1 ,2 ]
Holder, Jean-Marc [1 ,2 ]
Commes, Therese [1 ,2 ]
Philippe, Nicolas [1 ,2 ]
机构
[1] CHR Montpellier, Hop St Eloi, IRMB, SeqOne, 80 Ave Augustin Fliche, F-34295 Montpellier, France
[2] Inst Computat Biol, 860 Rue St Priest, F-34095 Montpellier 5, France
[3] Univ Lille, CNRS, CRIStAL Ctr Rech Informat Signal & Automat Lille, INRIA,Cent Lille,UMR 9189, F-59000 Lille, France
[4] Univ Bordeaux, INSERM, BMGIC, U1035, F-33076 Bordeaux, France
来源
BMC BIOINFORMATICS | 2017年 / 18卷
关键词
RNA-Seq; Transcriptomics; Benchmark; Pipeline optimization; FRAMEWORK; ALIGNMENT; DISCOVERY; BENCHMARK;
D O I
10.1186/s12859-017-1831-5
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: The evolution of next-generation sequencing (NGS) technologies has led to increased focus on RNA-Seq. Many bioinformatic tools have been developed for RNA-Seq analysis, each with unique performance characteristics and configuration parameters. Users face an increasingly complex task in understanding which bioinformatic tools are best for their specific needs and how they should be configured. In order to provide some answers to these questions, we investigate the performance of leading bioinformatic tools designed for RNA-Seq analysis and propose a methodology for systematic evaluation and comparison of performance to help users make well informed choices. Results: To evaluate RNA-Seq pipelines, we developed a suite of two benchmarking tools. SimCT generates simulated datasets that get as close as possible to specific real biological conditions accompanied by the list of genomic incidents and mutations that have been inserted. BenchCT then compares the output of any bioinformatics pipeline that has been run against a SimCT dataset with the simulated genomic and transcriptional variations it contains to give an accurate performance evaluation in addressing specific biological question. We used these tools to simulate a real-world genomic medicine question s involving the comparison of healthy and cancerous cells. Results revealed that performance in addressing a particular biological context varied significantly depending on the choice of tools and settings used. We also found that by combining the output of certain pipelines, substantial performance improvements could be achieved. Conclusion: Our research emphasizes the importance of selecting and configuring bioinformatic tools for the specific biological question being investigated to obtain optimal results. Pipeline designers, developers and users should include benchmarking in the context of their biological question as part of their design and quality control process. Our SimBA suite of benchmarking tools provides a reliable basis for comparing the performance of RNA-Seq bioinformatics pipelines in addressing a specific biological question. We would like to see the creation of a reference corpus of data-sets that would allow accurate comparison between benchmarks performed by different groups and the publication of more benchmarks based on this public corpus. SimBA software and data-set are available at http://cractools.gforge.inria.fr/softwares/simba/.
引用
收藏
页数:14
相关论文
共 50 条
  • [1] SimBA: A methodology and tools for evaluating the performance of RNA-Seq bioinformatic pipelines
    Jérôme Audoux
    Mikaël Salson
    Christophe F. Grosset
    Sacha Beaumeunier
    Jean-Marc Holder
    Thérèse Commes
    Nicolas Philippe
    BMC Bioinformatics, 18
  • [2] Demystifying emerging bulk RNA-Seq applications: the application and utility of bioinformatic methodology
    Thind, Amarinder Singh
    Monga, Isha
    Thakur, Prasoon Kumar
    Kumari, Pallawi
    Dindhoria, Kiran
    Krzak, Monika
    Ranson, Marie
    Ashford, Bruce
    BRIEFINGS IN BIOINFORMATICS, 2021, 22 (06)
  • [3] A benchmark for RNA-seq quantification pipelines
    Teng, Mingxiang
    Love, Michael I.
    Davis, Carrie A.
    Djebali, Sarah
    Dobin, Alexander
    Graveley, Brenton R.
    Li, Sheng
    Mason, Christopher E.
    Olson, Sara
    Pervouchine, Dmitri
    Sloan, Cricket A.
    Wei, Xintao
    Zhan, Lijun
    Irizarry, Rafael A.
    GENOME BIOLOGY, 2016, 17
  • [4] A benchmark for RNA-seq quantification pipelines
    Mingxiang Teng
    Michael I. Love
    Carrie A. Davis
    Sarah Djebali
    Alexander Dobin
    Brenton R. Graveley
    Sheng Li
    Christopher E. Mason
    Sara Olson
    Dmitri Pervouchine
    Cricket A. Sloan
    Xintao Wei
    Lijun Zhan
    Rafael A. Irizarry
    Genome Biology, 17
  • [5] Comparisons and performance evaluations of RNA-seq alignment tools
    Wang, Wei-An
    Tsai, Mong-Hsun
    Wu, Chin-Ting
    Lai, Liang-Chuan
    Lu, Tzu-Pin
    Chuang, Eric Y.
    2014 INTERNATIONAL CONFERENCE ON ELECTRICAL ENGINEERING AND COMPUTER SCIENCE (ICEECS), 2014, : 215 - 218
  • [6] Erratum to: A benchmark for RNA-seq quantification pipelines
    Mingxiang Teng
    Michael I. Love
    Carrie A. Davis
    Sarah Djebali
    Alexander Dobin
    Brenton R. Graveley
    Sheng Li
    Christopher E. Mason
    Sara Olson
    Dmitri Pervouchine
    Cricket A. Sloan
    Xintao Wei
    Lijun Zhan
    Rafael A. Irizarry
    Genome Biology, 17
  • [7] Erratum to: A benchmark for RNA-seq quantification pipelines
    Mingxiang Teng
    Michael I. Love
    Carrie A. Davis
    Sarah Djebali
    Alexander Dobin
    Brenton R. Graveley
    Sheng Li
    Christopher E. Mason
    Sara Olson
    Dmitri Pervouchine
    Cricket A. Sloan
    Xintao Wei
    Lijun Zhan
    Rafael A. Irizarry
    Genome Biology, 17
  • [8] Benchmarking RNA-Seq Quantification Tools
    Chandramohan, Raghu
    Wu, Po-Yen
    Phan, John H.
    Wang, May D.
    2013 35TH ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY (EMBC), 2013, : 647 - 650
  • [9] Application of bioinformatic tools in cell type classification for single-cell RNA-seq data
    Sujana, Shah Tania Akter
    Shahjaman, Md.
    Singha, Atul Chandra
    COMPUTATIONAL BIOLOGY AND CHEMISTRY, 2025, 115
  • [10] Data Analysis of RNA-Seq of the Bone Marrow from Patients with Myeloid Neoplasms Using Multiple Bioinformatic Pipelines
    Schneider, Thomas
    Smith, Geoffrey
    Newman, Scott
    Zhang, Linsheng
    MODERN PATHOLOGY, 2016, 29 : 400A - 400A