A MapReduce-based scalable discovery and indexing of structured big data

被引:23
|
作者
Singh, Hari [1 ]
Bawa, Seema [1 ]
机构
[1] Thapar Univ, Comp Sci & Engn Dept, Patiala, Punjab, India
关键词
Hadoop; Distributed computing; MapReduce; HDFS; Cluster; B-Tree;
D O I
10.1016/j.future.2017.03.028
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Various methods and techniques have been proposed in past for improving performance of queries on structured and unstructured data. The paper proposes a parallel B-Tree index in the MapReduce framework for improving efficiency of random reads over the existing approaches. The benefit of using the MapReduce framework is that it encapsulates the complexity of implementing parallelism and fault tolerance from users and presents these in a user friendly way. The proposed index reduces the number of data accesses for range queries and thus improves efficiency. The B-Tree index on MapReduce is implemented in a chained-MapReduce process that reduces intermediate data access time between successive map and reduce functions, and improves efficiency. Finally, five performance metrics have been used to validate the performance of proposed index for range search query in MapReduce, such as, varying cluster size and, size of range search query coverage on execution time, the number of map tasks and size of Input/Output (I/O) data. The effect of varying Hadoop Distributed File System (HDFS) block size and, analysis of the size of heap memory and intermediate data generated during map and reduce functions also shows the superiority of the proposed index. It is observed through experimental results that the parallel B-Tree index along with a chained-MapReduce environment performs better than default non-indexed dataset of the Hadoop and B-Tree like Global Index (Zhao et al., 2012) in MapReduce. (C) 2017 Elsevier B.V. All rights reserved.
引用
收藏
页码:32 / 43
页数:12
相关论文
共 50 条
  • [31] MapReduce-based parallel GEP algorithm for efficient function mining in big data applications
    Liu, Yang
    Ma, Chenxiao
    Xu, Lixiong
    Shen, Xiaodong
    Li, Maozhen
    Li, Pengcheng
    CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2018, 30 (23):
  • [32] Scaling up MapReduce-based Big Data Processing on Multi-GPU systems
    Hai Jiang
    Yi Chen
    Zhi Qiao
    Tien-Hsiung Weng
    Kuan-Ching Li
    Cluster Computing, 2015, 18 : 369 - 383
  • [33] LandQυ2: A MapReduce-Based System for Processing Arable Land Quality Big Data
    Yao, Xiaochuang
    Mokbel, Mohamed E.
    Ye, Sijing
    Li, Guoqing
    Alarabi, Louai
    Eldawy, Ahmed
    Zhao, Zuliang
    Zhao, Long
    Zhu, Dehai
    ISPRS INTERNATIONAL JOURNAL OF GEO-INFORMATION, 2018, 7 (07)
  • [34] MapReduce Based Scalable Range Query Architecture for Big Spatial Data
    Eken, Suleyman
    Kizgindere, Umut
    Sayar, Ahmet
    RISE OF BIG SPATIAL DATA, 2017, : 263 - 272
  • [35] A MapReduce-based distributed SVM ensemble for scalable image classification and annotation
    Alham, Nasullah Khalid
    Li, Maozhen
    Liu, Yang
    Qi, Man
    COMPUTERS & MATHEMATICS WITH APPLICATIONS, 2013, 66 (10) : 1920 - 1934
  • [36] MapReduce Based Scalable Range Query Architecture for Big Spatial Data
    Kizgindere, Umut
    Eken, Suleyman
    Sayar, Ahmet
    2015 IEEE/ACS 12TH INTERNATIONAL CONFERENCE OF COMPUTER SYSTEMS AND APPLICATIONS (AICCSA), 2015,
  • [37] A MapReduce-based distributed and scalable framework for stitching of satellite mosaic images
    Eken S.
    Sayar A.
    Arabian Journal of Geosciences, 2021, 14 (18)
  • [38] Gaussian relevance vector MapReduce-based annealed Glowworm optimization for big medical data scheduling
    Patan, Rizwan
    Kallam, Suresh
    Gandomi, Amir H.
    Hanne, Thomas
    Ramachandran, Manikandan
    JOURNAL OF THE OPERATIONAL RESEARCH SOCIETY, 2022, 73 (10) : 2204 - 2215
  • [39] A MapReduce-Based Nearest Neighbor Approach for Big-Data-Driven Traffic Flow Prediction
    Xia, Dawen
    Li, Huaqing
    Wang, Binfeng
    Li, Yantao
    Zhang, Zili
    IEEE ACCESS, 2016, 4 : 2920 - 2934
  • [40] A MapReduce-Based Big Spatial Data Framework for Solving the Problem of Covering a Polygon with Orthogonal Rectangles
    Eken, Suleyman
    Sayar, Ahmet
    TEHNICKI VJESNIK-TECHNICAL GAZETTE, 2019, 26 (01): : 36 - 42