Optimization of hadoop cluster for analyzing large-scale sequence data in bioinformatics

被引：0

作者：

Toth, Adam ^{[1
]}

Karimi, Ramin ^{[1
]}

机构：

[1] Univ Debrecen, Fac Informat, Debrecen, Hungary

来源：

ANNALES MATHEMATICAE ET INFORMATICAE | 2019年 / 50卷

关键词：

hadoop; optimization; next-Generation Sequencing; DNA signature; resource management; TECHNOLOGIES;

D O I：

10.33039/ami.2019.01.002

中图分类号：

O1 [数学];

学科分类号：

0701 ; 070101 ;

摘要：

Unexpected growth of high-throughput sequencing platforms in recent years impacted virtually all areas of modern biology. However, the ability to produce data continues to outpace the ability to analyze them. Therefore, continuous efforts are also needed to improve bioinformatics applications for a better use of these research opportunities. Due to the complexity and diversity of metagenomics data, it has been a major challenging field of bioinformatics. Sequence-based identification methods such as using DNA signature (unique k-mer) are the most recent popular methods of real-time analysis of raw sequencing data. DNA signature discovery is compute-intensive and time-consuming. Hadoop, the application of parallel and distributed computing is one of the popular applications for the analysis of large scale data in bioinformatics. Optimization of the time-consumption and computational resource usages such as CPU consumption and memory usage are the main goals of this paper, along with the management of the Hadoop cluster nodes.

引用

页码：187 / 202

页数：16

共 50 条

[1] A Data Locality Optimization Algorithm for Large-scale Data Processing in Hadoop
Zhao, Yanrong
Wang, Weiping
Meng, Dan
Yang, Xiufeng
Zhang, Shubin
Li, Jun
Guan, Gang
[J]. 2012 IEEE SYMPOSIUM ON COMPUTERS AND COMMUNICATIONS (ISCC), 2012, : 655 - 661
[2] Monitoring and Analyzing Big Traffic Data of a Large-Scale Cellular Network with Hadoop
Liu, Jun
Liu, Feng
Ansari, Nirwan
[J]. IEEE NETWORK, 2014, 28 (04): : 32 - 39
[3] Large-Scale Machine Learning and Optimization for Bioinformatics Data Analysis
Cheng, Jianlin
[J]. ACM-BCB 2020 - 11TH ACM CONFERENCE ON BIOINFORMATICS, COMPUTATIONAL BIOLOGY, AND HEALTH INFORMATICS, 2020,
[4] BioPig: a Hadoop-based analytic toolkit for large-scale sequence data
Nordberg, Henrik
Bhatia, Karan
Wang, Kai
Wang, Zhong
[J]. BIOINFORMATICS, 2013, 29 (23) : 3014 - 3019
[5] Hadoop-HBase for Large-Scale Data
Vora, Mehul Nalin
[J]. 2011 INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND NETWORK TECHNOLOGY (ICCSNT), VOLS 1-4, 2012, : 601 - 605
[6] Analyzing Patterns in Large-Scale Graphs Using MapReduce in Hadoop
Schultz, Joshua
Vieyra, Jonathan
Lu, Enyue
[J]. 2012 SC COMPANION: HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS (SCC), 2012, : 1459 - 1459
[7] Analyzing Patterns in Large-Scale Graphs Using MapReduce in Hadoop
Schultz, Joshua
Vierya, Jonathan
Lu, Enyue
[J]. 2012 SC COMPANION: HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS (SCC), 2012, : 1457 - +
[8] Large-Scale Pairwise Sequence Alignments on a Large-Scale GPU Cluster
Savran, Ibrahim
Gao, Yang
Bakos, Jason D.
[J]. IEEE DESIGN & TEST, 2014, 31 (01) : 51 - 61
[9] Large-scale open bioinformatics data resources
Stupka, E
[J]. CURRENT OPINION IN MOLECULAR THERAPEUTICS, 2002, 4 (03) : 265 - 274
[10] Efficient bioinformatics approaches for large-scale data analysis
Hautaniemi, S.
[J]. FEBS JOURNAL, 2011, 278 : 27 - 27

← 1 2 3 4 5 →