A Model for Distributed Processing and Analyses of NGS Data under Map-Reduce Paradigm

被引:7
|
作者
Samaddar, Sandip [1 ]
Sinha, Rituparna [2 ]
De, Rajat K. [3 ]
机构
[1] Heritage Inst Technol, Dept Comp Sci & Engn, Kolkata 700107, India
[2] Heritage Inst Technol, Dept Informat Technol, Kolkata 700107, India
[3] Indian Stat Inst, Machine Intelligence Unit, Kolkata 700108, India
关键词
CNV; hadoop; personalised medicine; cancer; NGS; fault tolerant model; bioinformatics analytical workflow; HADOOP; GENOME;
D O I
10.1109/TCBB.2018.2816022
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Massively parallel sequencing technique, introduced by NGS technology, has resulted in an exponential growth of sequencing data, with greatly reduced cost and increased throughput. This huge explosion of data has introduced new challenges in regard to its storage, integration, processing, and analyses. In this paper, we have proposed a novel distributed model under Map-Reduce paradigm to address the NGS big data problem. The architecture of the model involves Map-Reduce based modularized approach involving three different phases that support various analytical pipelines. The first phase will generate detailed base level information of various individual genomes, by granulating the alignment data. The other two phases independently process this base level information in parallel. One of these two phases will provide an integrated DNA profile of multiple individuals, whereas the other phase will generate contigs with similar features in an individual. Each of these three phases will generate a repository of genomic information that will facilitate other analytical pipelines. A simulated and real experimental prototypes has been provided as results to show the effectiveness of the model and its superiority over a few existing popular models and tools. A detailed description of the scope of applications of this model is also included in this article.
引用
收藏
页码:827 / 840
页数:14
相关论文
共 50 条
  • [1] DISRAY: A distributed ray tracing by map-reduce
    Mohammadzaheri, Afsaneh
    Sadeghi, Hossein
    Hosseini, Sayyed Keivan
    Navazandeh, Mahdi
    [J]. COMPUTERS & GEOSCIENCES, 2013, 52 : 453 - 458
  • [2] Implementation of Map-Reduce Based Distributed System
    Wang Yidan
    Liu Yi
    Gao Boqi
    [J]. PROCEEDINGS OF THE 2015 INTERNATIONAL INDUSTRIAL INFORMATICS AND COMPUTER ENGINEERING CONFERENCE, 2015, : 1014 - 1017
  • [3] Comparison of Map-Reduce and SQL on large-scale data processing
    Leu, Jenq-Shiou
    Yee, Yun-Sun
    Chen, Wa-Lin
    [J]. JOURNAL OF THE CHINESE INSTITUTE OF ENGINEERS, 2013, 36 (01) : 27 - 34
  • [4] Beyond Map-Reduce: LATNODE - A New Programming Paradigm for Big Data Systems
    Sheng, Chai Yit
    Keong, Phang Keat
    [J]. INFORMATION SCIENCE AND APPLICATIONS 2017, ICISA 2017, 2017, 424 : 69 - 75
  • [5] Efficient Skyline query processing of massive data based on Map-Reduce
    Ding, Lin-Lin
    Xin, Jun-Chang
    Wang, Guo-Ren
    Huang, Shan
    [J]. Jisuanji Xuebao/Chinese Journal of Computers, 2011, 34 (10): : 1785 - 1796
  • [6] Applying Map-Reduce to Imbalanced Data Classification
    Jedrzejowicz, Joanna
    Neumann, Jakub
    Synowczyk, Piotr
    Zakrzewska, Magdalena
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON INNOVATIONS IN INTELLIGENT SYSTEMS AND APPLICATIONS (INISTA), 2017, : 29 - 33
  • [7] Applying Map-Reduce paradigm for parallel closed cube computation
    Sergey, Kuznecov
    Yury, Kudryavcev
    [J]. 2009 FIRST INTERNATIONAL CONFERENCE ON ADVANCES IN DATABASES, KNOWLEDGE, AND DATA APPLICATIONS, 2009, : 62 - +
  • [8] Realtime File Processing Based on Map-Reduce Framework
    Cabau, George
    Salagean, Andrea Timea
    Sebestyen-Pal, Gheorghe
    [J]. 2015 IEEE 11TH INTERNATIONAL CONFERENCE ON INTELLIGENT COMPUTER COMMUNICATION AND PROCESSING (ICCP), 2015, : 537 - 543
  • [9] Internet-scale support for map-reduce processing
    Costa, Fernando
    Veiga, Luis
    Ferreira, Paulo
    [J]. JOURNAL OF INTERNET SERVICES AND APPLICATIONS, 2013, 4 : 1 - 17
  • [10] Predicting Heart Diseases from Large Scale IoT Data Using a Map-Reduce Paradigm
    Abd, Faris Mohammad
    Manaa, Mehdi Ebady
    [J]. OPEN COMPUTER SCIENCE, 2020, 10 (01): : 422 - 430