Big Data Analysis Using Hadoop Cluster

被引:0
|
作者
Saldhi, Ankita [1 ]
Goel, Abhinav [2 ]
Yadav, Dipesh [3 ]
Saldhi, Ankur [4 ]
Saksena, Dhruv [5 ]
Indu, S. [6 ]
机构
[1] Ctr Dev Telemat, Mandi Rd, Delhi 110030, India
[2] Aardee Solut, Delhi 110059, India
[3] Designo Interior, Delhi 110085, India
[4] Jamia Millia Islamia, Dept Comp Engn, Delhi 110025, India
[5] Carnegie Mellon Univ, Pittsburgh, PA 15213 USA
[6] Delhi Technol Univ, Elect & Commun Engn Dept, Delhi 110042, India
关键词
Big data; Hadoop; distributed data processing; data mining; Mappers; Reducers;
D O I
暂无
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Industries keep a check on all statistics of their business and process this data using various data mining techniques to measure profit trends, revenue, growing markets and interesting opportunities to invest. These statistical records keep on increasing and increase very fast. Unfortunately, as the data grows it becomes a tedious task to process such a large data set and extract meaningful information. Also if the data generated is in various formats, its processing possesses new challenges. Owing to its size, big data is stored in Hadoop Distributed File System (HDFS). In this standard architecture, all the DataNodes function parallel but functioning of a single Data Node is still in sequential fashion. This paper proposes to execute tasks assigned to a single Data Node in parallel instead of executing them sequentially. We propose to use a bunch of streaming multi-processors (SMs) for each single Data Node. An SM can have various processors and memory and all SMs run in parallel and independently. We process big data which may be coming from different sources in different formats to run parallelly on a Hadoop cluster, use the proposed technique and yield desired results efficiently. We have applied proposed methodology to the raw data of an industrial firm, for doing intelligent business, with a final objective of finding profit generated for the firm and its trends throughout a year. We have done analysis over a yearlong data as trends generally repeat after a year.
引用
收藏
页码:572 / 575
页数:4
相关论文
共 50 条
  • [41] Budget Constraint Scheduler for Big Data Using Hadoop MapReduce
    Vinutha D.C.
    Raju G.T.
    [J]. SN Computer Science, 2021, 2 (4)
  • [42] Addressing Big Data Problem Using Hadoop and Map Reduce
    Patel, Aditya B.
    Birla, Manashvi
    Nair, Ushma
    [J]. 3RD NIRMA UNIVERSITY INTERNATIONAL CONFERENCE ON ENGINEERING (NUICONE 2012), 2012,
  • [43] Analyzing Relationships in Terrorism Big Data Using Hadoop and Statistics
    Strang, Kenneth David
    Sun, Zhaohao
    [J]. JOURNAL OF COMPUTER INFORMATION SYSTEMS, 2017, 57 (01) : 67 - 75
  • [44] Social-Media Data Analysis Using Tessera Framework in the Hadoop Cluster Environment
    Sarnovsky, Martin
    Butka, Peter
    Paulina, Jakub
    [J]. INFORMATION SYSTEMS ARCHITECTURE AND TECHNOLOGY - ISAT 2016 - PT II, 2017, 522 : 239 - 251
  • [45] Web Log Data Preprocessing using Raspberry Pi Cluster and hadoop cluster
    Svec, Peter
    Chylo, Lukas
    Filipik, Jakub
    [J]. DIVAI 2018: 12TH INTERNATIONAL SCIENTIFIC CONFERENCE ON DISTANCE LEARNING IN APPLIED INFORMATICS, 2018, : 513 - 521
  • [46] Frequent Item set Using Abundant Data on Hadoop Clusters in Big Data
    Danapaquiame, N.
    Balaji, V.
    Gayathri, R.
    Kodhai, E.
    Sambasivam, G.
    [J]. BIOSCIENCE BIOTECHNOLOGY RESEARCH COMMUNICATIONS, 2018, 11 (01): : 104 - 112
  • [47] Research on Industry Data Analysis Model Based on Hadoop Big Data Platform
    Xu, Hongsheng
    Fan, Ganglong
    Li, Ke
    [J]. PROCEEDINGS OF THE 7TH INTERNATIONAL CONFERENCE ON EDUCATION, MANAGEMENT, INFORMATION AND COMPUTER SCIENCE (ICEMC 2017), 2017, 73 : 783 - 787
  • [48] Big data and Spark: Comparison with Hadoop
    Benlachmi, Yassine
    Hasnaoui, Moulay Lahcen
    [J]. PROCEEDINGS OF THE 2020 FOURTH WORLD CONFERENCE ON SMART TRENDS IN SYSTEMS, SECURITY AND SUSTAINABILITY (WORLDS4 2020), 2020, : 811 - 817
  • [49] Handling Big Data with Hadoop Toolkit
    Devakunchari, R.
    [J]. 2014 INTERNATIONAL CONFERENCE ON INFORMATION COMMUNICATION AND EMBEDDED SYSTEMS (ICICES), 2014,
  • [50] A Performance Analysis of MapReduce Task with Large Number of Files Dataset in Big Data Using Hadoop
    Pal, Amrit
    Agrawal, Pinki
    Jain, Kunal
    Agrawal, Sanjay
    [J]. 2014 FOURTH INTERNATIONAL CONFERENCE ON COMMUNICATION SYSTEMS AND NETWORK TECHNOLOGIES (CSNT), 2014, : 587 - 591