Integrated Systems for NGS Data Management and Analysis: Open Issues and Available Solutions

被引:32
|
作者
Bianchi, Valerio [1 ,3 ,4 ]
Ceol, Arnaud [1 ]
Ogier, Alessandro G. E. [2 ]
de Pretis, Stefano [1 ]
Galeota, Eugenia [1 ]
Kishore, Kamal [1 ]
Bora, Pranami [1 ]
Croci, Ottavio [1 ]
Campaner, Stefano [1 ]
Amati, Bruno [1 ,2 ]
Morelli, Marco J. [1 ]
Pelizzola, Mattia [1 ]
机构
[1] Fdn Ist Italiano Tecnol, Ctr Genom Sci, IIT SEMM, Milan, Italy
[2] European Inst Oncol, Dept Expt Oncol, Milan, Italy
[3] Hubrecht Inst KNAW, Uppsalalaan 8, NL-3584 CT Utrecht, Netherlands
[4] Univ Med Ctr, Uppsalalaan 8, NL-3584 CT Utrecht, Netherlands
关键词
MICROARRAY; SOFTWARE; BIOINFORMATICS; FRAMEWORK; TAVERNA; BIOLOGY; SUITE; TOOL; RNA;
D O I
10.3389/fgene.2016.00075
中图分类号
Q3 [遗传学];
学科分类号
071007 ; 090102 ;
摘要
Next-generation sequencing (NGS) technologies have deeply changed our understanding of cellular processes by delivering an astonishing amount of data at affordable prices; nowadays, many biology laboratories have already accumulated a large number of sequenced samples. However, managing and analyzing these data poses new challenges, which may easily be underestimated by research groups devoid of IT and quantitative skills. In this perspective, we identify five issues that should be carefully addressed by research groups approaching NGS technologies. In particular, the five key issues to be considered concern: (1) adopting a laboratory management system (LIMS) and safeguard the resulting raw data structure in downstream analyses; (2) monitoring the flow of the data and standardizing input and output directories and file names, even when multiple analysis protocols are used on the same data; (3) ensuring complete traceability of the analysis performed; (4) enabling non experienced users to run analyses through a graphical user interface (GUI) acting as a front-end for the pipelines; (5) relying on standard metadata to annotate the datasets, and when possible using controlled vocabularies, ideally derived from biomedical ontologies. Finally, we discuss the currently available tools in the light of these issues, and we introduce HIS-flow, a new workflow management system conceived to address the concerns we raised. HIS-flow is able to retrieve information from a LIMS database, manages data analyses through a simple GUI, outputs data in standard locations and allows the complete traceability of datasets, accompanying metadata and analysis scripts.
引用
收藏
页数:8
相关论文
共 50 条
  • [1] OPEN DATA MANAGEMENT ISSUES FOR GEOGRAPHIC INFORMATION SYSTEMS
    Aydinoglu, Arif C.
    INFORMATICS, GEOINFORMATICS AND REMOTE SENSING CONFERENCE PROCEEDINGS, SGEM 2016, VOL I, 2016, : 789 - 794
  • [2] A survey of issues and solutions of health data management systems
    Mondal, Anindita Sarkar
    Neogy, Sarmistha
    Mukherjee, Nandini
    Chattopadhyay, Samiran
    INNOVATIONS IN SYSTEMS AND SOFTWARE ENGINEERING, 2019, 15 (02) : 155 - 166
  • [3] A survey of issues and solutions of health data management systems
    Anindita Sarkar Mondal
    Sarmistha Neogy
    Nandini Mukherjee
    Samiran Chattopadhyay
    Innovations in Systems and Software Engineering, 2019, 15 : 155 - 166
  • [4] Evolution of Data Management Systems: State of the Art and Open Issues
    Hameurlain, Abdelkader
    ADVANCES IN DATABASES AND INFORMATION SYSTEMS, ADBIS 2019, 2019, 11695 : XVII - XVIII
  • [5] Spatial Data Management in IoT systems: A study of available storage and indexing solutions
    Krommyda, Maria
    Kantere, Verena
    2020 SECOND INTERNATIONAL CONFERENCE ON TRANSDISCIPLINARY AI (TRANSAI 2020), 2020, : 146 - 153
  • [6] Geochemical data management - issues and solutions
    Adcock, Stephen W.
    Spirito, Wendy A.
    Garrett, Robert G.
    GEOCHEMISTRY-EXPLORATION ENVIRONMENT ANALYSIS, 2013, 13 (04) : 337 - 348
  • [7] An Advanced Open Data Platform for Integrated Support of Data Management, Distribution, and Analysis
    Won, Heesun
    Nguyen, Minh Chau
    Gil, Myeong-Seon
    Moon, Yang-Sae
    2021 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2021, : 2058 - 2063
  • [8] Study of Open Issues on Big Data Management
    Ohshima, Naoki
    Keshavarz, Hassan
    Hassan, Wan Haslina
    Komaki, Shozo
    PROCEEDINGS OF THE 11TH INTERNATIONAL CONFERENCE ON INNOVATION AND MANAGEMENT, VOLS I AND II, 2014, : 1245 - 1246
  • [9] Issues of data scalability in open hypermedia systems
    University of Colorado, Boulder Dept. of Comp. Sci. ECOT 717, Campus Box 430, Boulder, CO 80309-0430, United States
    New Rev Hypermedia Multimedia, (151-177):
  • [10] Data Quality in Big Data Processing: Issues, Solutions and Open Problems
    Zhang, Pengcheng
    Xiong, Fang
    Gao, Jerry
    Wang, Jimin
    2017 IEEE SMARTWORLD, UBIQUITOUS INTELLIGENCE & COMPUTING, ADVANCED & TRUSTED COMPUTED, SCALABLE COMPUTING & COMMUNICATIONS, CLOUD & BIG DATA COMPUTING, INTERNET OF PEOPLE AND SMART CITY INNOVATION (SMARTWORLD/SCALCOM/UIC/ATC/CBDCOM/IOP/SCI), 2017,