Big Data Analysis using Apache Hadoop

被引:0
|
作者
Manikandan, Shankar Ganesh [1 ]
Ravi, Siddarth [1 ]
机构
[1] Dhanalakshmi Coll Engn, Dept Informat Technol, Madras, Tamil Nadu, India
关键词
Big Data Analysis; Big Data Management; Map Reduce; HDFS; MAPREDUCE;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
We live in on-demand, on-command Digital universe with data prolifering by Institutions, Individuals and Machines at a very high rate. This data is categories as "Big Data" due to its sheer Volume, Variety and Velocity. Most of this data is unstructured, quasi structured or semi structured and it is heterogeneous in nature. The volume and the heterogeneity of data with the speed it is generated, makes it difficult for the present computing infrastructure to manage Big Data. Traditional data management, warehousing and analysis systems fall short of tools to analyze this data. Due to its specific nature of Big Data, it is stored in distributed file system architectures. Hadoop and HDFS by Apache is widely used for storing and managing Big Data. Analyzing Big Data is a challenging task as it involves large distributed file systems which should be fault tolerant, flexible and scalable. Map Reduce is widely been used for the efficient analysis of Big Data. Traditional DBMS techniques like Joins and Indexing and other techniques like graph search is used for classification and clustering of Big Data. These techniques are being adopted to be used in Map Reduce. In this paper we suggest various methods for catering to the problems in hand through Map Reduce framework over Hadoop Distributed File System (HDFS). Map Reduce is a Minimization technique which makes use of file indexing with mapping, sorting, shuffling and finally reducing. Map Reduce techniques have been studied in this paper which is implemented for Big Data analysis using HDFS.
引用
收藏
页数:4
相关论文
共 50 条
  • [1] Processing of Big Educational Data in the Cloud Using Apache Hadoop
    Machova, Renata
    Komarkova, Jitka
    Lnenicka, Martin
    [J]. INTERNATIONAL CONFERENCE ON INFORMATION SOCIETY (I-SOCIETY 2016), 2016, : 46 - 49
  • [2] Shared Disk Big Data Analytics with Apache Hadoop
    Mukherjee, Anirban
    Datta, Joydip
    Jorapur, Raghavendra
    Singhvi, Ravi
    Haloi, Saurav
    Akram, Wasim
    [J]. 2012 19TH INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING (HIPC), 2012,
  • [3] Analysis of Big Data Storage Tools for Data Lakes based on Apache Hadoop Platform
    Belov, Vladimir
    Nikulchev, Evgeny
    [J]. INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2021, 12 (08) : 551 - 557
  • [4] Developing a big data analytics platform using Apache Hadoop Ecosystem for delivering big data services in libraries
    Singh, Ranjeet Kumar
    [J]. DIGITAL LIBRARY PERSPECTIVES, 2024, 40 (02) : 160 - 186
  • [5] Big Data Analysis Using Hadoop Cluster
    Saldhi, Ankita
    Goel, Abhinav
    Yadav, Dipesh
    Saldhi, Ankur
    Saksena, Dhruv
    Indu, S.
    [J]. 2014 IEEE INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND COMPUTING RESEARCH (IEEE ICCIC), 2014, : 572 - 575
  • [6] Optimization of Multiple Queries for Big Data with Apache Hadoop/Hive
    Garg, Varun
    [J]. 2015 INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND COMMUNICATION NETWORKS (CICN), 2015, : 938 - 941
  • [7] Information Retrieval Using Hadoop Big Data Analysis
    Motwani, Deepak
    Madan, Madan Lal
    [J]. ADVANCES IN OPTICAL SCIENCE AND ENGINEERING, 2015, 166 : 409 - 415
  • [8] CLUSTERING AND INDEXING OF MULTIPLE DOCUMENTS USING FEATURE EXTRACTION THROUGH APACHE HADOOP ON BIG DATA
    Lydia, E. Laxmi
    Moses, G. Jose
    Varadarajan, Vijayakumar
    Nonyelu, Fredi
    Maseleno, Andino
    Perumal, Eswaran
    Shankar, K.
    [J]. MALAYSIAN JOURNAL OF COMPUTER SCIENCE, 2020, : 108 - 123
  • [9] Analyzing and Scripting Indian Election strategies using Big Data via Apache Hadoop framework
    Jagdev, Gagandeep
    Kaur, Amandeep
    [J]. 2016 5TH INTERNATIONAL CONFERENCE ON WIRELESS NETWORKS AND EMBEDDED SYSTEMS (WECON), 2016, : 59 - 67
  • [10] Performance Analysis of ECG Big Data using Apache Hive and Apache Pig
    Ahmad, Mudassar
    Kanwal, Safina
    Cheema, Maryam
    Habib, Muhammad Asif
    [J]. 2019 8TH INTERNATIONAL CONFERENCE ON INFORMATION AND COMMUNICATION TECHNOLOGIES (ICICT 2019), 2019, : 2 - 7