A MapReduce-based scalable discovery and indexing of structured big data

被引:23
|
作者
Singh, Hari [1 ]
Bawa, Seema [1 ]
机构
[1] Thapar Univ, Comp Sci & Engn Dept, Patiala, Punjab, India
来源
FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE | 2017年 / 73卷
关键词
Hadoop; Distributed computing; MapReduce; HDFS; Cluster; B-Tree;
D O I
10.1016/j.future.2017.03.028
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Various methods and techniques have been proposed in past for improving performance of queries on structured and unstructured data. The paper proposes a parallel B-Tree index in the MapReduce framework for improving efficiency of random reads over the existing approaches. The benefit of using the MapReduce framework is that it encapsulates the complexity of implementing parallelism and fault tolerance from users and presents these in a user friendly way. The proposed index reduces the number of data accesses for range queries and thus improves efficiency. The B-Tree index on MapReduce is implemented in a chained-MapReduce process that reduces intermediate data access time between successive map and reduce functions, and improves efficiency. Finally, five performance metrics have been used to validate the performance of proposed index for range search query in MapReduce, such as, varying cluster size and, size of range search query coverage on execution time, the number of map tasks and size of Input/Output (I/O) data. The effect of varying Hadoop Distributed File System (HDFS) block size and, analysis of the size of heap memory and intermediate data generated during map and reduce functions also shows the superiority of the proposed index. It is observed through experimental results that the parallel B-Tree index along with a chained-MapReduce environment performs better than default non-indexed dataset of the Hadoop and B-Tree like Global Index (Zhao et al., 2012) in MapReduce. (C) 2017 Elsevier B.V. All rights reserved.
引用
收藏
页码:32 / 43
页数:12
相关论文
共 50 条
  • [21] Knowledge process of health big data using MapReduce-based associative mining
    So-Young Choi
    Kyungyong Chung
    Personal and Ubiquitous Computing, 2020, 24 : 571 - 581
  • [22] MR-DBSCAN:a scalable MapReduce-based DBSCAN algorithm for heavily skewed data
    Yaobin HE
    Haoyu TAN
    Wuman LUO
    Shengzhong FENG
    Jianping FAN
    Frontiers of Computer Science, 2014, 8 (01) : 83 - 99
  • [23] A scalable MapReduce-based design of an unsupervised entity resolution system
    Hagan, Nicholas Kofi Akortia
    Talburt, John R.
    Anderson, Kris E.
    Hagan, Deasia
    FRONTIERS IN BIG DATA, 2024, 7
  • [24] MR-DBSCAN: a scalable MapReduce-based DBSCAN algorithm for heavily skewed data
    He, Yaobin
    Tan, Haoyu
    Luo, Wuman
    Feng, Shengzhong
    Fan, Jianping
    FRONTIERS OF COMPUTER SCIENCE, 2014, 8 (01) : 83 - 99
  • [25] MassJoin: A MapReduce-based Method for Scalable String Similarity Joins
    Deng, Dong
    Li, Guoliang
    Hao, Shuang
    Wang, Jiannan
    Feng, Jianhua
    2014 IEEE 30TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE), 2014, : 340 - 351
  • [26] Scaling up MapReduce-based Big Data Processing on Multi-GPU systems
    Jiang, Hai
    Chen, Yi
    Qiao, Zhi
    Weng, Tien-Hsiung
    Li, Kuan-Ching
    CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS, 2015, 18 (01): : 369 - 383
  • [27] Distributed Big Data Clustering using MapReduce-based Fuzzy C-Medoids
    Sardar T.H.
    Ansari Z.
    Journal of The Institution of Engineers (India): Series B, 2022, 103 (01) : 73 - 82
  • [28] MapReduce-Based Complex Big Data Analytics over Uncertain and Imprecise Social Networks
    Braun, Peter
    Cuzzocrea, Alfredo
    Jiang, Fan
    Leung, Carson Kai-Sang
    Pazdor, Adam G. M.
    BIG DATA ANALYTICS AND KNOWLEDGE DISCOVERY, DAWAK 2017, 2017, 10440 : 130 - 145
  • [29] MapReduce-Based Growing Neural Gas for Scalable Cluster Environments
    Fliege, Johannes
    Benn, Wolfgang
    MACHINE LEARNING AND DATA MINING IN PATTERN RECOGNITION (MLDM 2016), 2016, 9729 : 545 - 559
  • [30] MapReduce-Based D_ELT Framework to Address the Challenges of Geospatial Big Data
    Jo, Junghee
    Lee, Kang-Woo
    ISPRS INTERNATIONAL JOURNAL OF GEO-INFORMATION, 2019, 8 (11)