Integrated Systems for NGS Data Management and Analysis: Open Issues and Available Solutions

被引:32
|
作者
Bianchi, Valerio [1 ,3 ,4 ]
Ceol, Arnaud [1 ]
Ogier, Alessandro G. E. [2 ]
de Pretis, Stefano [1 ]
Galeota, Eugenia [1 ]
Kishore, Kamal [1 ]
Bora, Pranami [1 ]
Croci, Ottavio [1 ]
Campaner, Stefano [1 ]
Amati, Bruno [1 ,2 ]
Morelli, Marco J. [1 ]
Pelizzola, Mattia [1 ]
机构
[1] Fdn Ist Italiano Tecnol, Ctr Genom Sci, IIT SEMM, Milan, Italy
[2] European Inst Oncol, Dept Expt Oncol, Milan, Italy
[3] Hubrecht Inst KNAW, Uppsalalaan 8, NL-3584 CT Utrecht, Netherlands
[4] Univ Med Ctr, Uppsalalaan 8, NL-3584 CT Utrecht, Netherlands
关键词
MICROARRAY; SOFTWARE; BIOINFORMATICS; FRAMEWORK; TAVERNA; BIOLOGY; SUITE; TOOL; RNA;
D O I
10.3389/fgene.2016.00075
中图分类号
Q3 [遗传学];
学科分类号
071007 ; 090102 ;
摘要
Next-generation sequencing (NGS) technologies have deeply changed our understanding of cellular processes by delivering an astonishing amount of data at affordable prices; nowadays, many biology laboratories have already accumulated a large number of sequenced samples. However, managing and analyzing these data poses new challenges, which may easily be underestimated by research groups devoid of IT and quantitative skills. In this perspective, we identify five issues that should be carefully addressed by research groups approaching NGS technologies. In particular, the five key issues to be considered concern: (1) adopting a laboratory management system (LIMS) and safeguard the resulting raw data structure in downstream analyses; (2) monitoring the flow of the data and standardizing input and output directories and file names, even when multiple analysis protocols are used on the same data; (3) ensuring complete traceability of the analysis performed; (4) enabling non experienced users to run analyses through a graphical user interface (GUI) acting as a front-end for the pipelines; (5) relying on standard metadata to annotate the datasets, and when possible using controlled vocabularies, ideally derived from biomedical ontologies. Finally, we discuss the currently available tools in the light of these issues, and we introduce HIS-flow, a new workflow management system conceived to address the concerns we raised. HIS-flow is able to retrieve information from a LIMS database, manages data analyses through a simple GUI, outputs data in standard locations and allows the complete traceability of datasets, accompanying metadata and analysis scripts.
引用
收藏
页数:8
相关论文
共 50 条
  • [41] Trust and Reputation Management in Healthcare Systems: Taxonomy, Requirements and Open Issues
    Jabeen, Farhana
    Hamid, Zara
    Akhunzada, Adnan
    Abdul, Wadood
    Ghouzali, Sanaa
    IEEE ACCESS, 2018, 6 : 17246 - 17263
  • [42] Commodity hardware and open source solutions in FTU data management
    Centioli, C
    Bracco, G
    Eccher, S
    Iannone, F
    Maslennikov, A
    Panella, A
    Vitale, V
    FUSION ENGINEERING AND DESIGN, 2004, 71 (1-4) : 195 - 200
  • [43] Model and data management issues in the integrated assessment of existing building stocks
    Honic, Meliha
    Kovacic, Iva
    ORGANIZATION TECHNOLOGY AND MANAGEMENT IN CONSTRUCTION, 2020, 12 (01): : 2148 - 2157
  • [44] Adversarial attacks against intrusion detection systems: Taxonomy, solutions and open issues
    Corona, Igino
    Giacinto, Giorgio
    Roli, Fabio
    INFORMATION SCIENCES, 2013, 239 : 201 - 225
  • [45] Risk management in dermatology: an analysis of data available from several British-based reporting systems
    Gawkrodger, D. J.
    BRITISH JOURNAL OF DERMATOLOGY, 2011, 164 (03) : 537 - 543
  • [46] CONFORMITY ISSUES FOR HEALTH AND SAFETY AT WORK - PART OF INTEGRATED MANAGEMENT SYSTEMS
    Botezatu, Cezar
    Botezatu, Cornelia Paulina
    Carutasu, George
    ANNALS OF DAAAM FOR 2008 & PROCEEDINGS OF THE 19TH INTERNATIONAL DAAAM SYMPOSIUM, 2008, : 151 - 152
  • [47] Uncertainty components. Issues and a proposal for their integrated management in expert systems
    Bonarini, Andrea
    Proceedings of the World Congress on Expert Systems, 1991,
  • [48] Data mining in integrated data access and data analysis systems
    Yang, RX
    Kafatos, M
    Yang, KS
    Wang, XS
    DATA MINING FOR SCIENTIFIC AND ENGINEERING APPLICATIONS, 2001, 2 : 183 - 199
  • [49] Data and information management for integrated research - requirements, experiences and solutions
    Zander, F.
    Kralisch, S.
    Fluegel, W. -A.
    20TH INTERNATIONAL CONGRESS ON MODELLING AND SIMULATION (MODSIM2013), 2013, : 2201 - 2206
  • [50] Contemporary issues of open data in information systems research: Considerations and recommendations
    Link G.J.P.
    Lumbard K.
    Conboy K.
    Feldman M.
    Feller J.
    George J.
    Germonprez M.
    Goggins S.
    Jeske D.
    Kiely G.
    Schuster K.
    Willis M.
    Jeske, Debora, 1600, Association for Information Systems (41): : 587 - 610