A MapReduce-based scalable discovery and indexing of structured big data

被引:23
|
作者
Singh, Hari [1 ]
Bawa, Seema [1 ]
机构
[1] Thapar Univ, Comp Sci & Engn Dept, Patiala, Punjab, India
关键词
Hadoop; Distributed computing; MapReduce; HDFS; Cluster; B-Tree;
D O I
10.1016/j.future.2017.03.028
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Various methods and techniques have been proposed in past for improving performance of queries on structured and unstructured data. The paper proposes a parallel B-Tree index in the MapReduce framework for improving efficiency of random reads over the existing approaches. The benefit of using the MapReduce framework is that it encapsulates the complexity of implementing parallelism and fault tolerance from users and presents these in a user friendly way. The proposed index reduces the number of data accesses for range queries and thus improves efficiency. The B-Tree index on MapReduce is implemented in a chained-MapReduce process that reduces intermediate data access time between successive map and reduce functions, and improves efficiency. Finally, five performance metrics have been used to validate the performance of proposed index for range search query in MapReduce, such as, varying cluster size and, size of range search query coverage on execution time, the number of map tasks and size of Input/Output (I/O) data. The effect of varying Hadoop Distributed File System (HDFS) block size and, analysis of the size of heap memory and intermediate data generated during map and reduce functions also shows the superiority of the proposed index. It is observed through experimental results that the parallel B-Tree index along with a chained-MapReduce environment performs better than default non-indexed dataset of the Hadoop and B-Tree like Global Index (Zhao et al., 2012) in MapReduce. (C) 2017 Elsevier B.V. All rights reserved.
引用
收藏
页码:32 / 43
页数:12
相关论文
共 50 条
  • [41] CloudEC: A MapReduce-based Algorithm for Correcting Errors in Next-generation Sequencing Big Data
    Chung, Wei-Chun
    Ho, Jan-Ming
    Lin, Chung-Yen
    Lee, D. T.
    2017 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2017, : 2836 - 2842
  • [42] Knowledge Extraction from Big Data using MapReduce-based Parallel-Reduct Algorithm
    Chowdhury, Tapan
    Chakraborty, Susanta
    Setua, S. K.
    PROCEEDINGS OF 2016 5TH INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND NETWORK TECHNOLOGY (ICCSNT), 2016, : 240 - 246
  • [43] MR-BIRCH: A scalable MapReduce-based BIRCH clustering algorithm
    Li, Yufeng
    Jiang, HaiTian
    Lu, Jiyong
    Li, Xiaozhong
    Sun, Zhiwei
    Li, Min
    JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2021, 40 (03) : 5295 - 5305
  • [44] Optimization Driven MapReduce Framework for Indexing and Retrieval of Big Data
    Abdalla, Hemn Barzan
    Ahmed, Awder Mohammed
    Al Sibahee, M. A.
    KSII TRANSACTIONS ON INTERNET AND INFORMATION SYSTEMS, 2020, 14 (05): : 1886 - 1908
  • [45] MapReduce-based Parallel Algorithms for Multidimensionnal Data Analysis
    Pan, Jie
    Magoules, Frederic
    Le Biannic, Yann
    JOURNAL OF ALGORITHMS & COMPUTATIONAL TECHNOLOGY, 2012, 6 (02) : 325 - 350
  • [46] The MapReduce-based approach to improve vehicle controls on big traffic events
    Hamilton Adoni, Wilfried Yves
    Nahhal, Tarik
    Aghezzaf, Brahim
    Elbyed, Abdeltif
    2017 INTERNATIONAL COLLOQUIUM ON LOGISTICS AND SUPPLY CHAIN MANAGEMENT (LOGISTIQUA), 2017, : 1 - 6
  • [47] Big Data Quality Scoring for Structured Data Using MapReduce
    Wu, Yalong
    Dhamodharan, Shalini
    Ghattamaneni, Vinuthna
    Kokila, Narmada
    Pathakamuri, Chandrika
    Carter, Timothy
    Tian, Pu
    Sha, Kewei
    2024 33RD INTERNATIONAL CONFERENCE ON COMPUTER COMMUNICATIONS AND NETWORKS, ICCCN 2024, 2024,
  • [48] Big Data retrieval techniques based on Hash Indexing and MapReduce approach with NoSQL Database
    Gayathiri, N. R.
    Jaspher, David D.
    Natarajan, A. M.
    PROCEEDINGS OF THE 2019 INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTING & COMMUNICATION ENGINEERING (ICACCE-2019), 2019,
  • [49] MapReduce-based Capsule Networks
    Park, Sun Jin
    Park, Ho-Hyun
    2019 SIXTH INTERNATIONAL CONFERENCE ON SOCIAL NETWORKS ANALYSIS, MANAGEMENT AND SECURITY (SNAMS), 2019, : 99 - 101
  • [50] Enhancing in-memory efficiency for MapReduce-based data processing
    Veiga, Jorge
    Exposito, Roberto R.
    Taboada, Guillermo L.
    Tourino, Juan
    JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 2018, 120 : 323 - 338