Multi-file Queries Performance Improvement through Data Placement in Hadoop

被引:0
|
作者
Tang, Yu [1 ]
Abdulhay, Elham [1 ]
Fan, Aihua
Su, Sheng [1 ]
Gebreselassie, Kidus [1 ]
机构
[1] Univ Elect Sci & Technol China, Chengdu 611731, Peoples R China
关键词
HDFS; Block Placement; Data locality; Correlation;
D O I
暂无
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Hadoop is enjoying popularity for processing data-intensive jobs because of its data locality feature. However, the performance gained from Hadoop's above feature is currently limited by its default block placement policy, which implicitly assumes instances of MapReduce jobs access data from a single file. On the contrary, multi-file queries like indexing query or aggregation query need to process related data from more than one files found on different DataNodes of a cluster. In this paper we proposed a Correlation-based Block Placement (CBP) Algorithm that enhances the performance of these queries by placing related blocks on the same set of DataNodes. Furthermore, we developed a customized InputFormat that enables InputSplits contain records from different files. Simulation results demonstrated that the number of migrating data blocks for CBP was insignificant. On the contrary, for default policy case, the number of migrating data blocks increased significantly with the input dataset size. As a result, for any input dataset size, the performance of CBP exceeded that of the default policy.
引用
收藏
页码:986 / 991
页数:6
相关论文
共 50 条
  • [31] GSM data service performance improvement through the use of Slow Frequency Hopping
    Joyce, RM
    Ibbetson, LJ
    Lopes, LB
    [J]. 1997 IEEE 47TH VEHICULAR TECHNOLOGY CONFERENCE PROCEEDINGS, VOLS 1-3: TECHNOLOGY IN MOTION, 1997, : 1872 - 1876
  • [32] Performance improvement of the parallel Lattice Boltzmann method through blocked data distributions
    Schepke, Claudio
    Maillard, Nicolas
    [J]. 19TH INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE AND HIGH PERFORMANCE COMPUTING, PROCEEDINGS, 2007, : 71 - 78
  • [33] Extensibility of File Set Over Encoded Cloud Data Through Empowered Fine Grained Multi Keyword Search
    Balakrishnan, S.
    Janet, J.
    Spandana, S.
    [J]. PROCEEDINGS OF 2ND INTERNATIONAL CONFERENCE ON INTELLIGENT COMPUTING AND APPLICATIONS, 2017, 467 : 595 - 601
  • [34] Delivering individual surgeon performance data through a statewide surgical quality improvement collaborative
    Daley, Brian J.
    Guillamondegui, Oscar D.
    Cofer, Joseph B.
    Cecil, William
    Clarke, Chris
    [J]. JOURNAL OF THE AMERICAN COLLEGE OF SURGEONS, 2014, 219 (04) : E146 - E146
  • [35] Multi-objective optimization based optimal sizing & placement of multiple distributed generators for distribution network performance improvement
    Markana, Anilkumar
    Trivedi, Gargi
    Bhatt, Praghnesh
    [J]. RAIRO-OPERATIONS RESEARCH, 2021, 55 (02) : 899 - 919
  • [36] PROFS-performance-oriented data reorganization for log-structured file system on multi-zone disks
    Wang, J
    Hu, YM
    [J]. NINTH INTERNATIONAL SYMPOSIUM ON MODELING, ANALYSIS AND SIMULATION OF COMPUTER AND TELECOMMUNICATION SYSTEMS, PROCEEDINGS, 2001, : 285 - 292
  • [37] Performance Improvement of Human Centrifuge Systems through Multi-Objective Configurational Design Optimisation
    Winter, Asher
    Mohajer, Navid
    Nahavandi, Darius
    Mohamed, Shady
    [J]. AEROSPACE, 2023, 10 (12)
  • [38] Work-in-Progress: A PV Aware Data Placement Scheme for Read Performance Improvement on LDPC based Flash Memory
    Li, Qiao
    Shi, Liang
    Di, Yejia
    Du, Yajuan
    Wu, Kaijie
    Xue, Chun Jason
    Zhuge, Qingfeng
    Sha, Edwin H-M.
    [J]. 2017 INTERNATIONAL CONFERENCE ON HARDWARE/SOFTWARE CODESIGN AND SYSTEM SYNTHESIS (CODES+ISSS), 2017,
  • [39] Data Compression and Re-computation Based Performance Improvement in Multi-Core Architectures
    Koc, Hakduran
    Garlapati, Mounika
    Madupu, Pranitha P.
    [J]. 2020 10TH ANNUAL COMPUTING AND COMMUNICATION WORKSHOP AND CONFERENCE (CCWC), 2020, : 390 - 395
  • [40] Performance improvement of direction finding algorithms in non-homogeneous environment through data fusion
    Cherchar, Afnmar
    Thameri, Messaoud
    Belouchrani, Adel
    [J]. DIGITAL SIGNAL PROCESSING, 2015, 41 : 41 - 47