Multilevel Data Processing Using Parallel Algorithms for Analyzing Big Data in High-Performance Computing

被引:1
|
作者
Ahmad, Awais [1 ]
Paul, Anand [2 ]
Din, Sadia [2 ]
Rathore, M. Mazhar [2 ]
Choi, Gyu Sang [1 ]
Jeon, Gwanggil [3 ]
机构
[1] Yeungnam Univ, Dept Informat & Commun Engn, Gyeongbuk, South Korea
[2] Kyungpook Natl Univ, Sch Comp Sci & Engn, Daegu, South Korea
[3] Incheon Natl Univ, Dept Embedded Syst Engn, Incheon, South Korea
关键词
Big Data; HPC; Parallel Processing algorithm; Four-tier system architecture; DATA ANALYTICS; MAPREDUCE;
D O I
10.1007/s10766-017-0498-x
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
The growing gap between users and the Big Data analytics requires innovative tools that address the challenges faced by big data volume, variety, and velocity. Therefore, it becomes computationally inefficient to analyze such massive volume of data. Moreover, advancements in the field of Big Data application and data science poses additional challenges, where High-Performance Computing solution has become a key issue and has attracted attention in recent years. However, these systems are either memoryless or computational inefficient. Therefore, keeping in view the aforementioned needs, there is a requirement for a system that can efficiently analyze a stream of Big Data within their requirements. Hence, this paper presents a system architecture that enhances the working of traditional MapReduce by incorporating parallel processing algorithm. Moreover, complete four-tier architecture is also proposed that efficiently aggregate the data, eliminate unnecessary data, and analyze the data by the proposed parallel processing algorithm. The proposed system architecture both read and writes operations that enhance the efficiency of the Input/Output operation. To check the efficiency of the proposed algorithms exploited in the proposed system architecture, we have implemented our proposed system using Hadoop and MapReduce. MapReduce is supported by a parallel algorithm that efficiently processes a huge volume of data sets. The system is implemented using MapReduce tool at the top of the Hadoop parallel nodes to generate and process graphs with near real-time. Moreover, the system is evaluated in terms of efficiency by considering the system throughput and processing time. The results show that the proposed system is more scalable and efficient.
引用
收藏
页码:508 / 527
页数:20
相关论文
共 50 条
  • [1] Multilevel Data Processing Using Parallel Algorithms for Analyzing Big Data in High-Performance Computing
    Awais Ahmad
    Anand Paul
    Sadia Din
    M. Mazhar Rathore
    Gyu Sang Choi
    Gwanggil Jeon
    [J]. International Journal of Parallel Programming, 2018, 46 : 508 - 527
  • [2] High-Performance Computing for Big Data Processing
    Wu, Yulei
    Xiang, Yang
    Ge, Jingguo
    Muller, Peter
    [J]. FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2018, 88 : 693 - 695
  • [3] Contributions to High-Performance Big Data Computing
    Fox, Geoffrey
    Qiu, Judy
    Crandall, David
    Von Laszewski, Gregor
    Beckstein, Oliver
    Paden, John
    Paraskevakos, Ioannis
    Jha, Shantenu
    Wang, Fusheng
    Marathe, Madhav
    Vullikanti, Anil
    Cheatham, Thomas
    [J]. FUTURE TRENDS OF HPC IN A DISRUPTIVE SCENARIO, 2019, 34 : 34 - 81
  • [4] High Performance Computing Applications Using Parallel Data Processing Units
    Azadbakht, Keyvan
    Serbanescu, Vlad
    de Boer, Frank
    [J]. FUNDAMENTALS OF SOFTWARE ENGINEERING, FSEN 2015, 2015, 9392 : 191 - 206
  • [5] High-Performance Computing based Scalable Online Fuzzy Clustering Algorithms for Big Data
    Jha, Preeti
    Tiwari, Aruna
    Bharill, Neha
    Ratnaparkhe, Milind
    Patel, Om Prakash
    Pulakitha, Rapolu
    Chauhan, Aditi
    [J]. 2022 IEEE SYMPOSIUM SERIES ON COMPUTATIONAL INTELLIGENCE (SSCI), 2022, : 1400 - 1407
  • [6] Perspectives on High-Performance Computing in a Big Data World
    Fox, Geoffrey C.
    [J]. HPDC'19: PROCEEDINGS OF THE 28TH INTERNATIONAL SYMPOSIUM ON HIGH-PERFORMANCE PARALLEL AND DISTRIBUTED COMPUTING, 2019, : 145 - 145
  • [7] Multilevel Active Storage for Big Data Applications in High Performance Computing
    Chen, Chao
    Lang, Michael
    Chen, Yong
    [J]. 2013 IEEE INTERNATIONAL CONFERENCE ON BIG DATA, 2013,
  • [8] An Overview on the Convergence of High Performance Computing and Big Data Processing
    Mei, Songzhu
    Guan, Hongtao
    Wang, Qinglin
    [J]. 2018 IEEE 24TH INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED SYSTEMS (ICPADS 2018), 2018, : 1046 - 1051
  • [9] High Performance Processing of Satellite Data Using Distributed and Parallel Computing Techniques
    Damahe, Lalit B.
    Bramhe, Sanket S.
    Fursule, Nilay C.
    Shirbhate, Ram D.
    Ajmire, Pournima S.
    Kumar, Girish
    [J]. BIOSCIENCE BIOTECHNOLOGY RESEARCH COMMUNICATIONS, 2020, 13 (14): : 404 - 409
  • [10] Scalable, high-performance data mining with parallel processing
    Freitas, AA
    [J]. PRINCIPLES OF DATA MINING AND KNOWLEDGE DISCOVERY, 1998, 1510 : 477 - 477