An Automated Infrastructure to Support High-Throughput Bioinformatics

被引:0
|
作者
Cuccuru, Gianmauro [1 ]
Leo, Simone [1 ]
Lianas, Luca [1 ]
Muggiri, Michele [1 ]
Pinna, Andrea [1 ]
Pireddu, Luca [1 ]
Uva, Paolo [1 ]
Angius, Andrea [1 ]
Fotia, Giorgio [1 ]
Zanetti, Gianluigi [1 ]
机构
[1] CRS4, Pula, CA, Italy
关键词
Bioinformatics; NGS; MapReduce; DATA-MANAGEMENT; VARIANTS; GALAXY;
D O I
暂无
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
The number of domains affected by the big data phenomenon is constantly increasing, both in science and industry, with high-throughput DNA sequencers being among the most massive data producers. Building analysis frameworks that can keep up with such a high production rate, however, is only part of the problem: current challenges include dealing with articulated data repositories where objects are connected by multiple relationships, managing complex processing pipelines where each step depends on a large number of configuration parameters and ensuring reproducibility, error control and usability by nontechnical staff. Here we describe an automated infrastructure built to address the above issues in the context of the analysis of the data produced by the CRS4 next-generation sequencing facility. The system integrates open source tools, either written by us or publicly available, into a framework that can handle the whole data transformation process, from raw sequencer output to primary analysis results.
引用
收藏
页码:600 / 607
页数:8
相关论文
共 50 条
  • [1] Bioinformatics support for high-throughput proteomics
    Wilke, A
    Rückert, C
    Bartels, D
    Dondrup, M
    Goesmann, A
    Hüser, AT
    Kespohl, S
    Linke, B
    Mahne, M
    McHardy, A
    Pühler, A
    Meyer, F
    [J]. JOURNAL OF BIOTECHNOLOGY, 2003, 106 (2-3) : 147 - 156
  • [2] A high-throughput adaptive computing infrastructure for bioinformatics research
    Pineo, Stuart
    Zhengyu, Wang
    [J]. Proc. Int. Conf. Syst. Eng., (292-300):
  • [3] Application of high-throughput computing in bioinformatics
    Swindells, M
    Rae, M
    Pearce, M
    Moodie, S
    Miller, R
    Leach, P
    [J]. PHILOSOPHICAL TRANSACTIONS OF THE ROYAL SOCIETY A-MATHEMATICAL PHYSICAL AND ENGINEERING SCIENCES, 2002, 360 (1795): : 1179 - 1189
  • [4] Automated high-throughput Wannierisation
    Valerio Vitale
    Giovanni Pizzi
    Antimo Marrazzo
    Jonathan R. Yates
    Nicola Marzari
    Arash A. Mostofi
    [J]. npj Computational Materials, 6
  • [5] Automated high-throughput Wannierisation
    Vitale, Valerio
    Pizzi, Giovanni
    Marrazzo, Antimo
    Yates, Jonathan R.
    Marzari, Nicola
    Mostofi, Arash A.
    [J]. NPJ COMPUTATIONAL MATERIALS, 2020, 6 (01)
  • [6] Combining a high-throughput bioinformatics grid and bioinformatics web services
    Wang, Chunyan
    Gordon, Paul M. K.
    Turinsky, Andrei L.
    Burgess, Jason
    Dalton, Terry
    Sensen, Christoph W.
    [J]. DISTRIBUTED, HIGH-PERFORMANCE AND GRID COMPUTING IN COMPUTATIONAL BIOLOGY, PROCEEDINGS, 2007, 4360 : 1 - +
  • [7] High-throughput proteomics and bioinformatics: joined at the hip
    Martens, Lennart
    Hermjakob, Henning
    [J]. PROTEOMICS, 2010, 10 (06) : 1103 - 1104
  • [8] A high-throughput bioinformatics distributed computing platform
    Keane, TM
    Page, AJ
    McInerney, JO
    Naughton, TJ
    [J]. 18TH IEEE SYMPOSIUM ON COMPUTER-BASED MEDICAL SYSTEMS, PROCEEDINGS, 2005, : 377 - 382
  • [9] Advances in High-throughput Protein Structural Bioinformatics
    Zhu, Yun-Chi
    Lu, Zu-Hong
    [J]. PROGRESS IN BIOCHEMISTRY AND BIOPHYSICS, 2024, 51 (09) : 1989 - 1999
  • [10] Automated equipment for high-throughput experimentation
    Brändli, C
    Maiwald, P
    Schröer, J
    [J]. CHIMIA, 2003, 57 (05) : 284 - 289