An Automated Infrastructure to Support High-Throughput Bioinformatics

被引:0
|
作者
Cuccuru, Gianmauro [1 ]
Leo, Simone [1 ]
Lianas, Luca [1 ]
Muggiri, Michele [1 ]
Pinna, Andrea [1 ]
Pireddu, Luca [1 ]
Uva, Paolo [1 ]
Angius, Andrea [1 ]
Fotia, Giorgio [1 ]
Zanetti, Gianluigi [1 ]
机构
[1] CRS4, Pula, CA, Italy
关键词
Bioinformatics; NGS; MapReduce; DATA-MANAGEMENT; VARIANTS; GALAXY;
D O I
暂无
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
The number of domains affected by the big data phenomenon is constantly increasing, both in science and industry, with high-throughput DNA sequencers being among the most massive data producers. Building analysis frameworks that can keep up with such a high production rate, however, is only part of the problem: current challenges include dealing with articulated data repositories where objects are connected by multiple relationships, managing complex processing pipelines where each step depends on a large number of configuration parameters and ensuring reproducibility, error control and usability by nontechnical staff. Here we describe an automated infrastructure built to address the above issues in the context of the analysis of the data produced by the CRS4 next-generation sequencing facility. The system integrates open source tools, either written by us or publicly available, into a framework that can handle the whole data transformation process, from raw sequencer output to primary analysis results.
引用
收藏
页码:600 / 607
页数:8
相关论文
共 50 条
  • [21] Automated, high-throughput serum glycoprofiling platform
    Stoeckmann, H.
    O'Flaherty, R.
    Adamczyk, B.
    Saldova, R.
    Rudd, P. M.
    [J]. INTEGRATIVE BIOLOGY, 2015, 7 (09) : 1026 - 1032
  • [22] A high-throughput adaptive computing infrastructure for bioinformaties research
    Pineo, S
    Wang, ZY
    [J]. 18th International Conference on Systems Engineering, Proceedings, 2005, : 292 - 300
  • [23] A high-throughput overlay multicast infrastructure with network coding
    Wang, M
    Li, ZP
    Li, BC
    [J]. QUALITY OF SERVICE - IWQOS 2005, PROCEEDINGS, 2005, 3552 : 37 - 53
  • [24] An infrastructure for high-throughput microscopy: Instrumentation, informatics, and integration
    Vaisberg, Eugeni A.
    Lenzi, David
    Hansen, Richard L.
    Keon, Brigitte H.
    Finer, Jeffrey T.
    [J]. MEASURING BIOLOGICAL RESPONSES WITH AUTOMATED MICROSCOPY, 2006, 414 : 484 - 512
  • [25] High-throughput neuroimaging-genetics computational infrastructure
    Dinov, Ivo D.
    Petrosyan, Petros
    Liu, Zhizhong
    Eggert, Paul
    Hobel, Sam
    Vespa, Paul
    Moon, Seok Woo
    Van Horn, John D.
    Franco, Joseph
    Toga, Arthur W.
    [J]. FRONTIERS IN NEUROINFORMATICS, 2014, 8
  • [26] A high-throughput infrastructure for density functional theory calculations
    Jain, Anubhav
    Hautier, Geoffroy
    Moore, Charles J.
    Ong, Shyue Ping
    Fischer, Christopher C.
    Mueller, Tim
    Persson, Kristin A.
    Ceder, Gerbrand
    [J]. COMPUTATIONAL MATERIALS SCIENCE, 2011, 50 (08) : 2295 - 2310
  • [27] The challenges of delivering bioinformatics training in the analysis of high-throughput data
    Carvalho, Benilton S.
    Rustici, Gabriella
    [J]. BRIEFINGS IN BIOINFORMATICS, 2013, 14 (05) : 538 - 547
  • [28] High-throughput bioinformatics with the Cyrille2 pipeline system
    Fiers, Mark W. E. J.
    van der Burgt, Ate
    Datema, Erwin
    de Groot, Joost C. W.
    van Ham, Roeland C. H. J.
    [J]. BMC BIOINFORMATICS, 2008, 9 (1)
  • [29] High-throughput species identification: from DNA isolation to bioinformatics
    Richardson, David E.
    Vanwye, Jeffrey D.
    Exum, Amy M.
    Cowen, Robert K.
    Crawford, Douglas L.
    [J]. MOLECULAR ECOLOGY NOTES, 2007, 7 (02): : 199 - 207
  • [30] Novel Bioinformatics Approaches for Analysis of High-Throughput Biological Data
    Weng, Julia Tzu-Ya
    Wu, Li-Ching
    Chang, Wen-Chi
    Chang, Tzu-Hao
    Akutsu, Tatsuya
    Lee, Tzong-Yi
    [J]. BIOMED RESEARCH INTERNATIONAL, 2014, 2014