Handling Big Data Using a Data-Aware HDFS and Evolutionary Clustering Technique

被引:15
|
作者
Hajeer, Mustafa [1 ,2 ]
Dasgupta, Dipankar [3 ,4 ]
机构
[1] Univ Memphis, Memphis, TN 38152 USA
[2] Intel Data Ctr Grp, Santa Clara, CA 95054 USA
[3] Univ Memphis, Ctr Informat Assurance, Memphis, TN 38152 USA
[4] Univ Memphis, Intelligent Secur Syst Res Lab, Memphis, TN 38152 USA
关键词
Clustering methods; distributed computing; information management; optimization; scalability; COMMUNITY DETECTION; MODULARITY; FRAMEWORK; MODEL;
D O I
10.1109/TBDATA.2017.2782785
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The increased use of cyber-enabled systems and Internet-of-Things (IoT) led to a massive amount of data with different structures. Most big data solutions are built on top of the Hadoop eco-system or use its distributed file system (HDFS). However, studies have shown inefficiency in such systems when dealing with today's data. Some research overcame these problems for specific types of graph data, but today's data are more than one type of data. Such efficiency issues may lead to large-scale problems, including larger space requirements in data centers, and waste in resources (like power consumption), that in turn lead to environmental problems (such as more carbon emission) [1], as per scholars. We propose a data-aware module for the Hadoop eco-system. We also propose a distributed encoding technique for genetic algorithms efficient data processing. Our framework allows Hadoop to manage the distribution of data and its placement based on cluster analysis of the data itself. We are able to handle a broad range of data types as well as optimize query time and resource usage. We performed experiments on multiple datasets generated via LUBM (Lehigh University Benchmark) and reported results along with performance analysis.
引用
收藏
页码:134 / 147
页数:14
相关论文
共 50 条
  • [1] Data-Aware Support for Hybrid HPC and Big Data Applications
    Caino-Lores, Silvina
    Isaila, Florin
    Carretero, Jesus
    [J]. 2017 17TH IEEE/ACM INTERNATIONAL SYMPOSIUM ON CLUSTER, CLOUD AND GRID COMPUTING (CCGRID), 2017, : 719 - 722
  • [2] A Data-Aware Remote Procedure Call Method for Big Data Systems
    Wang, Jin
    Yang, Yaqiong
    Zhang, Jingyu
    Yu, Xiaofeng
    Alfarraj, Osama
    Tolba, Amr
    [J]. COMPUTER SYSTEMS SCIENCE AND ENGINEERING, 2020, 35 (06): : 523 - 532
  • [3] Data-Aware Clustering Hierarchy for wireless sensor networks
    Wu, Xiaochen
    Wang, Peng
    Wang, Wei
    Shi, Baile
    [J]. ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PROCEEDINGS, 2008, 5012 : 795 - 802
  • [4] Genetic Algorithm based Data-aware Group Scheduling for Big Data Clouds
    Kune, Raghavendra
    Konugurthi, Pramod Kumar
    Agarwal, Arun
    Chillarige, Raghavendra Rao
    Buyya, Rajkumar
    [J]. 2014 IEEE/ACM INTERNATIONAL SYMPOSIUM ON BIG DATA COMPUTING (BDC), 2014, : 96 - 104
  • [5] Data-aware multicast
    Baehni, S
    Eugster, PT
    Guerraoui, R
    [J]. 2004 INTERNATIONAL CONFERENCE ON DEPENDABLE SYSTEMS AND NETWORKS, PROCEEDINGS, 2004, : 233 - 242
  • [6] POSH: A Data-Aware Shell
    Raghavan, Deepti
    Fouladi, Sadjad
    Levis, Philip
    Zaharia, Matei
    [J]. PROCEEDINGS OF THE 2020 USENIX ANNUAL TECHNICAL CONFERENCE, 2020, : 617 - 631
  • [7] Handling Big Data Efficiently by using Map Reduce Technique
    Maitrey, Seema
    Jha, C. K.
    [J]. 2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND COMMUNICATION TECHNOLOGY CICT 2015, 2015, : 703 - 708
  • [8] BIG DATA RETRIEVAL USING HDFS WITH LZO COMPRESSION
    Prasanth, T.
    Aarthi, K.
    Gunasekaran, M.
    [J]. PROCEEDINGS OF THE 2019 INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTING & COMMUNICATION ENGINEERING (ICACCE-2019), 2019,
  • [9] A data-aware resource broker for data grids
    Le, H
    Coddington, P
    Wendelborn, AL
    [J]. NETWORK AND PARALLEL COMPUTING, PROCEEDINGS, 2004, 3222 : 73 - 82
  • [10] Data-Aware Compression for HPC using Machine Learning
    Plehn, Julius
    Fuchs, Anna
    Kuhn, Michael
    Luettgau, Jakob
    Ludwig, Thomas
    [J]. OPERATING SYSTEMS REVIEW, 2022, 56 (01) : 62 - 69