Multilevel Data Processing Using Parallel Algorithms for Analyzing Big Data in High-Performance Computing

被引:0
|
作者
Awais Ahmad
Anand Paul
Sadia Din
M. Mazhar Rathore
Gyu Sang Choi
Gwanggil Jeon
机构
[1] Yeungnam University,Department of Information and Communication Engineering
[2] Kyungpook National University,School of Computer Science and Engineering
[3] Incheon National University,Department of Embedded Systems Engineering
关键词
Big Data; HPC; Parallel Processing algorithm; Four-tier system architecture;
D O I
暂无
中图分类号
学科分类号
摘要
The growing gap between users and the Big Data analytics requires innovative tools that address the challenges faced by big data volume, variety, and velocity. Therefore, it becomes computationally inefficient to analyze such massive volume of data. Moreover, advancements in the field of Big Data application and data science poses additional challenges, where High-Performance Computing solution has become a key issue and has attracted attention in recent years. However, these systems are either memoryless or computational inefficient. Therefore, keeping in view the aforementioned needs, there is a requirement for a system that can efficiently analyze a stream of Big Data within their requirements. Hence, this paper presents a system architecture that enhances the working of traditional MapReduce by incorporating parallel processing algorithm. Moreover, complete four-tier architecture is also proposed that efficiently aggregate the data, eliminate unnecessary data, and analyze the data by the proposed parallel processing algorithm. The proposed system architecture both read and writes operations that enhance the efficiency of the Input/Output operation. To check the efficiency of the proposed algorithms exploited in the proposed system architecture, we have implemented our proposed system using Hadoop and MapReduce. MapReduce is supported by a parallel algorithm that efficiently processes a huge volume of data sets. The system is implemented using MapReduce tool at the top of the Hadoop parallel nodes to generate and process graphs with near real-time. Moreover, the system is evaluated in terms of efficiency by considering the system throughput and processing time. The results show that the proposed system is more scalable and efficient.
引用
收藏
页码:508 / 527
页数:19
相关论文
共 50 条
  • [21] How Big Data and High-Performance Computing Drive Brain Science
    Chen, Shanyu
    He, Zhipeng
    Han, Xinyin
    He, Xiaoyu
    Li, Ruilin
    Zhu, Haidong
    Zhao, Dan
    Dai, Chuangchuang
    Zhang, Yu
    Lu, Zhonghua
    Chi, Xuebin
    Niu, Beifang
    GENOMICS PROTEOMICS & BIOINFORMATICS, 2019, 17 (04) : 381 - 392
  • [22] Optimized load balancing in high-performance computing for big data analytics
    Mirtaheri, Seyedeh Leili
    Grandinetti, Lucio
    CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2021, 33 (16):
  • [23] High-Performance Geometric Algorithms for Sparse Computation in Big Data Analytics
    Baumann, Philipp
    Hochbaum, Dorit S.
    Spaen, Quico
    2017 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2017, : 546 - 555
  • [24] High-Performance Spatial Query Processing on Big Taxi Trip Data using GPGPUs
    Zhang, Jianting
    You, Simin
    Gruenwald, Le
    2014 IEEE INTERNATIONAL CONGRESS ON BIG DATA (BIGDATA CONGRESS), 2014, : 72 - 79
  • [25] High-Performance Geospatial Big Data Processing System Based on MapReduce
    Jo, Junghee
    Lee, Kang-Woo
    ISPRS INTERNATIONAL JOURNAL OF GEO-INFORMATION, 2018, 7 (10):
  • [26] Optimization and Upgrading of Big Data Processing Techniques in High Performance Computing Environments
    Li, Jianguang
    Applied Mathematics and Nonlinear Sciences, 2024, 9 (01)
  • [27] Parallel language processing system for high-performance computing
    Yamanaka, E
    Shindo, T
    FUJITSU SCIENTIFIC & TECHNICAL JOURNAL, 1997, 33 (01): : 39 - 51
  • [28] Parallel language processing system for high-performance computing
    Yamanaka, Eiji
    Shindo, Tatsuya
    Fujitsu Scientific and Technical Journal, 1997, 33 (01): : 39 - 51
  • [29] High-Performance Computing for Data Analytics
    Perrin, Dimitri
    Bezbradica, Marija
    Crane, Martin
    Ruskin, Heather J.
    Duhamel, Christophe
    2012 IEEE/ACM 16TH INTERNATIONAL SYMPOSIUM ON DISTRIBUTED SIMULATION AND REAL TIME APPLICATIONS (DS-RT), 2012, : 234 - 242
  • [30] HIGH-PERFORMANCE COMPUTING WEB SEARCH SYSTEM BASED ON COMPUTER BIG DATA
    Kang, Yingxi
    Tang, Beiping
    Hu, Xiaodong
    SCALABLE COMPUTING-PRACTICE AND EXPERIENCE, 2024, 25 (03): : 1932 - 1939