SimBA: A methodology and tools for evaluating the performance of RNA-Seq bioinformatic pipelines

被引:6
|
作者
Audoux, Jerome [1 ,2 ]
Salson, Mikael [3 ]
Grosset, Christophe F. [4 ]
Beaumeunier, Sacha [1 ,2 ]
Holder, Jean-Marc [1 ,2 ]
Commes, Therese [1 ,2 ]
Philippe, Nicolas [1 ,2 ]
机构
[1] CHR Montpellier, Hop St Eloi, IRMB, SeqOne, 80 Ave Augustin Fliche, F-34295 Montpellier, France
[2] Inst Computat Biol, 860 Rue St Priest, F-34095 Montpellier 5, France
[3] Univ Lille, CNRS, CRIStAL Ctr Rech Informat Signal & Automat Lille, INRIA,Cent Lille,UMR 9189, F-59000 Lille, France
[4] Univ Bordeaux, INSERM, BMGIC, U1035, F-33076 Bordeaux, France
来源
BMC BIOINFORMATICS | 2017年 / 18卷
关键词
RNA-Seq; Transcriptomics; Benchmark; Pipeline optimization; FRAMEWORK; ALIGNMENT; DISCOVERY; BENCHMARK;
D O I
10.1186/s12859-017-1831-5
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: The evolution of next-generation sequencing (NGS) technologies has led to increased focus on RNA-Seq. Many bioinformatic tools have been developed for RNA-Seq analysis, each with unique performance characteristics and configuration parameters. Users face an increasingly complex task in understanding which bioinformatic tools are best for their specific needs and how they should be configured. In order to provide some answers to these questions, we investigate the performance of leading bioinformatic tools designed for RNA-Seq analysis and propose a methodology for systematic evaluation and comparison of performance to help users make well informed choices. Results: To evaluate RNA-Seq pipelines, we developed a suite of two benchmarking tools. SimCT generates simulated datasets that get as close as possible to specific real biological conditions accompanied by the list of genomic incidents and mutations that have been inserted. BenchCT then compares the output of any bioinformatics pipeline that has been run against a SimCT dataset with the simulated genomic and transcriptional variations it contains to give an accurate performance evaluation in addressing specific biological question. We used these tools to simulate a real-world genomic medicine question s involving the comparison of healthy and cancerous cells. Results revealed that performance in addressing a particular biological context varied significantly depending on the choice of tools and settings used. We also found that by combining the output of certain pipelines, substantial performance improvements could be achieved. Conclusion: Our research emphasizes the importance of selecting and configuring bioinformatic tools for the specific biological question being investigated to obtain optimal results. Pipeline designers, developers and users should include benchmarking in the context of their biological question as part of their design and quality control process. Our SimBA suite of benchmarking tools provides a reliable basis for comparing the performance of RNA-Seq bioinformatics pipelines in addressing a specific biological question. We would like to see the creation of a reference corpus of data-sets that would allow accurate comparison between benchmarks performed by different groups and the publication of more benchmarks based on this public corpus. SimBA software and data-set are available at http://cractools.gforge.inria.fr/softwares/simba/.
引用
收藏
页数:14
相关论文
共 50 条
  • [21] RNA-seq analysis in forest tree species: bioinformatic problems and solutions
    Unai López de Heredia
    José Luis Vázquez-Poletti
    Tree Genetics & Genomes, 2016, 12
  • [22] Comparing Bioinformatic Gene Expression Profiling Methods: Microarray and RNA-Seq
    Mantione, Kirk J.
    Kream, Richard M.
    Kuzelova, Hana
    Ptacek, Radek
    Raboch, Jiri
    Samuel, Joshua M.
    Stefano, George B.
    MEDICAL SCIENCE MONITOR BASIC RESEARCH, 2014, 20 : 138 - 141
  • [23] A global assessment of RNA-seq performance
    Darren J. Burgess
    Nature Reviews Genetics, 2014, 15 (10) : 645 - 645
  • [24] Bioinformatic Analysis of Clear Cell Renal Carcinoma via ATAC-Seq and RNA-Seq
    Chang, Feng
    Chen, Zhenqiong
    Xu, Caixia
    Liu, Hailei
    Han, Pengyong
    INTELLIGENT COMPUTING THEORIES AND APPLICATION, ICIC 2022, PT II, 2022, 13394 : 374 - 382
  • [25] Creating Teaching Tools for RNA-Seq Analysis in R
    Burgess, Alexandra
    Abraham, Jacob
    Ohman, Anders
    Santiago, John
    Sanders, Jennifer
    FASEB JOURNAL, 2021, 35
  • [26] Current RNA-seq methodology reporting limits reproducibility
    Simoneau, Joel
    Dumontier, Simon
    Gosselin, Ryan
    Scott, Michelle S.
    BRIEFINGS IN BIOINFORMATICS, 2021, 22 (01) : 140 - 145
  • [27] pipeComp, a general framework for the evaluation of computational pipelines, reveals performant single cell RNA-seq preprocessing tools
    Pierre-Luc Germain
    Anthony Sonrel
    Mark D. Robinson
    Genome Biology, 21
  • [28] pipeComp, a general framework for the evaluation of computational pipelines, reveals performant single cell RNA-seq preprocessing tools
    Germain, Pierre-Luc
    Sonrel, Anthony
    Robinson, Mark D.
    GENOME BIOLOGY, 2020, 21 (01)
  • [29] RNA-seq as a tool for evaluating human embryo competence
    Groff, Abigail F.
    Resetkova, Nina
    DiDomenico, Francesca
    Sakkas, Denny
    Penzias, Alan
    Rinn, John L.
    Eggan, Kevin
    GENOME RESEARCH, 2019, 29 (10) : 1705 - 1718
  • [30] Comprehensive evaluation of RNA-seq analysis pipelines in diploid and polyploid species
    Paya-Milans, Miriam
    Olmstead, James W.
    Nunez, Gerardo
    Rinehart, Timothy A.
    Staton, Margaret
    GIGASCIENCE, 2018, 7 (12):