Leveraging Hadoop Framework to develop Duplication Detector and analysis using MapReduce, Hive and Pig

被引:0
|
作者
Sethi, Priyanka [1 ]
Kumar, Prakash [1 ]
机构
[1] Jaypee Inst Informat Technol, Dept Comp Sci Engn & Informat Technol, Noida 201307, India
关键词
NoSQL; Deduplication; Hadoop; MapReduce; HDFS; HBase; MongoDB; Hive; Pig;
D O I
暂无
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
The burgeoning volume of torrential data continues to grow exponentially in this very age of the Internet of Things. As this torrent of digital datasets continue to outgrow in datacenters, the focus needs to be shifted to stored data reduction methods and that too pertaining to NoSQL databases as traditional structured storage systems continuously tend to face challenges in providing the required storage, throughputs and computational power requirements necessary to capture, store, manage and analyze the deluge of data. Deduplication systems, thus designed, retain a single copy of redundant data on disk to save disk space, but what if we want to keep certain copies intentionally and need wishful elimination. This paper leverages Hadoop framework to design and develop a duplication detection system that detects multiple copies of the same data right at the file level itself and that too before transmission. Thereafter, various datasets are tuned for better performance and analysed using MapReduce, Hive and Pig.
引用
收藏
页码:454 / 460
页数:7
相关论文
共 50 条
  • [1] SmartGrids: MapReduce Framework using Hadoop
    Fanibhare, Vaibhav
    Dahake, Vijay
    [J]. 2016 3RD INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING AND INTEGRATED NETWORKS (SPIN), 2016, : 406 - 411
  • [2] Comparison and Analysis of RDF Data Using SPARQL, HIVE, PIG in Hadoop
    Chandel, Anshul
    Garg, Deepak
    [J]. COMPUTING AND NETWORK SUSTAINABILITY, 2017, 12 : 361 - 369
  • [3] Analysis of Apache Logs Using Hadoop and Hive
    Velinov, Aleksandar
    Zdravev, Zoran
    [J]. TEM JOURNAL-TECHNOLOGY EDUCATION MANAGEMENT INFORMATICS, 2018, 7 (03): : 645 - 650
  • [4] Data Analysis using Hadoop MapReduce Environment
    Merla, PrathyushaRani
    Liang, Yiheng
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2017, : 4783 - 4785
  • [5] Parallelized Genetic Operations for SBST using Hadoop MapReduce Framework
    Mayandi, Geethapriya
    Arumugam, Chamundeswari
    [J]. 2014 INTERNATIONAL CONFERENCE ON ADVANCED COMMUNICATION CONTROL AND COMPUTING TECHNOLOGIES (ICACCCT), 2014, : 1686 - 1691
  • [6] Framework for Analyzing Web Access Logs using Hadoop and MapReduce
    Borgaonkar, Pranjali
    Kumar, Gaurav
    Yaduwanshi, Jyoti
    [J]. 2018 INTERNATIONAL CONFERENCE ON RECENT INNOVATIONS IN ELECTRICAL, ELECTRONICS & COMMUNICATION ENGINEERING (ICRIEECE 2018), 2018, : 2124 - 2129
  • [7] Combiner to Reduce the Time of Processing in Trend Analysis using Hadoop's MapReduce Framework
    Pinto, Vivek Francis
    [J]. 2017 2ND INTERNATIONAL CONFERENCE ON COMPUTATIONAL SYSTEMS AND INFORMATION TECHNOLOGY FOR SUSTAINABLE SOLUTION (CSITSS-2017), 2017, : 166 - 169
  • [8] Movie Dataset Analysis using Hadoop-Hive
    Ashwitha, T. A.
    Rodrigues, Anisha P.
    Chiplunkar, Niranjan N.
    [J]. 2017 2ND INTERNATIONAL CONFERENCE ON COMPUTATIONAL SYSTEMS AND INFORMATION TECHNOLOGY FOR SUSTAINABLE SOLUTION (CSITSS-2017), 2017, : 181 - 186
  • [9] An approach for MapReduce based Log analysis using Hadoop
    Hingave, Hemant
    Ingle, Rasika
    [J]. 2015 2ND INTERNATIONAL CONFERENCE ON ELECTRONICS AND COMMUNICATION SYSTEMS (ICECS), 2015, : 1264 - 1268
  • [10] MapReduce Based Analysis of Sample Applications Using Hadoop
    Ghazi, Mohd Rehan
    Raghava, N. S.
    [J]. APPLICATIONS OF COMPUTING AND COMMUNICATION TECHNOLOGIES, ICACCT 2018, 2018, 899 : 34 - 44