Integrated Systems for NGS Data Management and Analysis: Open Issues and Available Solutions

被引:32
|
作者
Bianchi, Valerio [1 ,3 ,4 ]
Ceol, Arnaud [1 ]
Ogier, Alessandro G. E. [2 ]
de Pretis, Stefano [1 ]
Galeota, Eugenia [1 ]
Kishore, Kamal [1 ]
Bora, Pranami [1 ]
Croci, Ottavio [1 ]
Campaner, Stefano [1 ]
Amati, Bruno [1 ,2 ]
Morelli, Marco J. [1 ]
Pelizzola, Mattia [1 ]
机构
[1] Fdn Ist Italiano Tecnol, Ctr Genom Sci, IIT SEMM, Milan, Italy
[2] European Inst Oncol, Dept Expt Oncol, Milan, Italy
[3] Hubrecht Inst KNAW, Uppsalalaan 8, NL-3584 CT Utrecht, Netherlands
[4] Univ Med Ctr, Uppsalalaan 8, NL-3584 CT Utrecht, Netherlands
关键词
MICROARRAY; SOFTWARE; BIOINFORMATICS; FRAMEWORK; TAVERNA; BIOLOGY; SUITE; TOOL; RNA;
D O I
10.3389/fgene.2016.00075
中图分类号
Q3 [遗传学];
学科分类号
071007 ; 090102 ;
摘要
Next-generation sequencing (NGS) technologies have deeply changed our understanding of cellular processes by delivering an astonishing amount of data at affordable prices; nowadays, many biology laboratories have already accumulated a large number of sequenced samples. However, managing and analyzing these data poses new challenges, which may easily be underestimated by research groups devoid of IT and quantitative skills. In this perspective, we identify five issues that should be carefully addressed by research groups approaching NGS technologies. In particular, the five key issues to be considered concern: (1) adopting a laboratory management system (LIMS) and safeguard the resulting raw data structure in downstream analyses; (2) monitoring the flow of the data and standardizing input and output directories and file names, even when multiple analysis protocols are used on the same data; (3) ensuring complete traceability of the analysis performed; (4) enabling non experienced users to run analyses through a graphical user interface (GUI) acting as a front-end for the pipelines; (5) relying on standard metadata to annotate the datasets, and when possible using controlled vocabularies, ideally derived from biomedical ontologies. Finally, we discuss the currently available tools in the light of these issues, and we introduce HIS-flow, a new workflow management system conceived to address the concerns we raised. HIS-flow is able to retrieve information from a LIMS database, manages data analyses through a simple GUI, outputs data in standard locations and allows the complete traceability of datasets, accompanying metadata and analysis scripts.
引用
收藏
页数:8
相关论文
共 50 条
  • [31] Wind turbine vibration management: An integrated analysis of existing solutions, products, and Open-source developments
    Machado, M. R.
    Dutkiewicz, M.
    ENERGY REPORTS, 2024, 11 : 3756 - 3791
  • [32] Checkpointing and Rollback Recovery in Distributed Systems: Existing AV Solutions, Open Issues and Proposed Solutions
    Manivannan, D.
    NEW ASPECTS OF SYSTEMS, PTS I AND II, 2008, : 569 - +
  • [33] PANORAUMA: AN INTEGRATED OPEN COMMONS FOR MANAGEMENT AND SHARING OF NEUROTRAUMA DATA
    Huie, J. Russell
    Radabaugh, Hannah
    Fond, Kenneth
    Vavrek, Romana
    Chiu, Michael
    Keller, Anastasia
    Gensel, John
    Visser, Ubbo
    Bixby, John
    Lemmon, Vance
    Grethe, Jeffrey
    Martone, Maryann
    Torres-Espin, Abel
    Ferguson, Adam
    JOURNAL OF NEUROTRAUMA, 2023, 40 (15-16) : A29 - A30
  • [34] An integrated approach to federated identity and privilege management in open systems
    Bhatti, Rafae
    Bertino, Elisa
    Ghafoor, Arif
    COMMUNICATIONS OF THE ACM, 2007, 50 (02) : 81 - 87
  • [35] A Survey of Thermal Management in Cloud Data Centre: Techniques and Open Issues
    Rama Rani
    Ritu Garg
    Wireless Personal Communications, 2021, 118 : 679 - 713
  • [36] A Survey of Thermal Management in Cloud Data Centre: Techniques and Open Issues
    Rani, Rama
    Garg, Ritu
    WIRELESS PERSONAL COMMUNICATIONS, 2021, 118 (01) : 679 - 713
  • [37] Oqtans: a Galaxy-integrated workflow for quantitative transcriptome analysis from NGS Data
    Sebastian J Schultheiss
    Géraldine Jean
    Jonas Behr
    Regina Bohnert
    Philipp Drewe
    Nico Görnitz
    André Kahles
    Pramod Mudrakarta
    Vipin T Sreedharan
    Georg Zeller
    Gunnar Rätsch
    BMC Bioinformatics, 12
  • [38] Oqtans: a Galaxy-integrated workflow for quantitative transcriptome analysis from NGS Data
    Schultheiss, Sebastian J.
    Jean, Geraldine
    Behr, Jonas
    Bohnert, Regina
    Drewe, Philipp
    Goernitz, Nico
    Kahles, Andre
    Mudrakarta, Pramod
    Sreedharan, Vipin T.
    Zeller, Georg
    Raetsch, Gunnar
    BMC BIOINFORMATICS, 2011, 12
  • [39] Building Connections: Using Integrated Administrative Data to Identify Issues and Solutions Spanning the Child Welfare and Child Support Systems
    Howard, Lanikque
    Vogel, Lisa Klein
    Cancian, Maria
    Noyes, Jennifer L.
    RSF-THE RUSSELL SAGE JOURNAL OF THE SOCIAL SCIENCES, 2019, 5 (02): : 70 - 85
  • [40] Development of Effective Knowledge Management Systems: Review and Open Research Issues
    Sukumaran, Sanath
    Simon, Casper Gihes Kaun
    Chandran, Kanchana
    PROCEEDINGS OF THE 19TH EUROPEAN CONFERENCE ON KNOWLEDGE MANAGEMENT (ECKM 2018), VOLS 1 AND 2, 2018, : 829 - 837