The Use of Distributed Data Storage and Processing Systems in Bioinformatic Data Analysis

被引:0
|
作者
Bochenek, Michal [4 ]
Folkert, Kamil [4 ]
Jaksik, Roman [3 ]
Krzesiak, Michal [4 ]
Michalak, Marcin [1 ]
Sikora, Marek [2 ]
Steclik, Tomasz [1 ]
Wrobel, Lukasz [2 ]
机构
[1] Inst Innovat Technol EMAG, Ul Leopolda 31, PL-40189 Katowice, Poland
[2] Silesian Tech Univ, Inst Informat, Ul Akad 16, PL-44100 Gliwice, Poland
[3] Silesian Tech Univ, Inst Automat Control, Ul Akad 16, PL-44100 Gliwice, Poland
[4] 3 Soft SA, Ul Porcelanowa 23, PL-40246 Katowice, Poland
关键词
Hadoop ecosystem; Biomedical data; Distributed computing; TCGA data analysis; Gene mutations; SEQUENCE ALIGNMENT; PROTEIN ALIGNMENT;
D O I
10.1007/978-3-319-99987-6_2
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The cancer and the cancer mortality may seem the sign of the present times. This leads hundreds of scientists to handle the issue of finding significant premises of cancer occurrence. In this paper a set of data mining tasks is defined that joins the observed genes mutation with the specific cancer type observation. Due to the high computational complexity of this kind of data a Hadoop ecosystem cluster was developed to perform the required calculations. The results may be satisfactory in the domains of distributed data storage (processing) and the genes mutation occurrence interpretation.
引用
收藏
页码:18 / 32
页数:15
相关论文
共 50 条
  • [1] Data Processing on Distributed Systems Storage Challenges
    Eddoujaji, Mohamed
    Samadi, Hassan
    Bohorma, Mohamed
    [J]. NETWORKING, INTELLIGENT SYSTEMS AND SECURITY, 2022, 237 : 795 - 811
  • [2] Coded Data Rebalancing for Distributed Data Storage Systems with Cyclic Storage
    Chandramouli, Athreya
    Vaishya, Abhinav
    Krishnan, Prasad
    [J]. 2022 IEEE INFORMATION THEORY WORKSHOP (ITW), 2022, : 618 - 623
  • [3] Analysis of Data Reliability Tradeoffs in Hybrid Distributed Storage Systems
    Tang, Bing
    Fedak, Gilles
    [J]. 2012 IEEE 26TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS & PHD FORUM (IPDPSW), 2012, : 1546 - 1555
  • [4] SIMPLIFIED NEAR DATA PLACEMENT PROCESSING LAYOUT FOR DISTRIBUTED STORAGE OBJECT SYSTEMS
    Mohan, Subhashini
    Adams, Ian. F.
    Nallamalla, Santhosh Kumar
    Sharma, Aaditya
    [J]. 2022 IEEE WOMEN IN TECHNOLOGY CONFERENCE (WINTECHCON): SMARTER TECHNOLOGIES FOR A SUSTAINABLE AND HYPER-CONNECTED WORLD, 2022,
  • [5] An Efficient Approach for Storage of Big Data Streams in Distributed Stream Processing Systems
    Alshamrani, Sultan
    Waseem, Quadri
    Alharbi, Abdullah
    Alosaimi, Wael
    Turabieh, Hamza
    Alyami, Hashem
    [J]. INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2020, 11 (05) : 91 - 98
  • [6] Effective method to restore data in distributed data storage systems
    Bardis, Nikolaos
    Doukas, Nikolaos
    Markovskyi, Oleksandr P.
    [J]. 2015 IEEE MILITARY COMMUNICATIONS CONFERENCE (MILCOM 2015), 2015, : 1248 - 1253
  • [7] Data Placement Strategy in Data Center Distributed Storage Systems
    Qin, Yang
    Ai, Xiao
    Chen, Lingjian
    Yang, Weihong
    [J]. 2016 IEEE INTERNATIONAL CONFERENCE ON COMMUNICATION SYSTEMS (ICCS), 2016,
  • [8] Authorization of data access in distributed storage systems
    Feichtinger, D
    Peters, AJ
    [J]. 2005 6TH INTERNATIONAL WORKSHOP ON GRID COMPUTING (GRID), 2005, : 172 - 178
  • [9] Big Data Distributed Storage and Processing Case Studies
    Islam, Tariqul
    Abid, Mehedi Hasan
    [J]. THIRD INTERNATIONAL CONFERENCE ON IMAGE PROCESSING AND CAPSULE NETWORKS (ICIPCN 2022), 2022, 514 : 826 - 837
  • [10] Organization of Cloud Storage Data in Distributed Systems
    Strubytskyi, Rostyslav
    [J]. 2016 13TH INTERNATIONAL CONFERENCE ON MODERN PROBLEMS OF RADIO ENGINEERING, TELECOMMUNICATIONS AND COMPUTER SCIENCE (TCSET), 2016, : 463 - 467