Analysis of Apache Logs Using Hadoop and Hive

被引:0
|
作者
Velinov, Aleksandar [1 ]
Zdravev, Zoran [1 ]
机构
[1] Goce Delcev Univ, Fac Comp Sci, Krste Misirkov 10-A, Stip, Macedonia
关键词
Logs; Hadoop; Hive; analysis;
D O I
10.18421/TEM73-22
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In this paper we consider an analysis of Apache web logs using Cloudera Hadoop distribution and Hive for querying the data in the web logs. We used public available web logs from NASA Kennedy Space Center server. HDFS (Hadoop distributed file system) was used as a logs container. The apache web logs were copied to the HDFS from the local file system. We made an analysis for the total number of hits, unique IPs, the most common hosts that made request to the NASA server in Florida, the most common types of errors. We also examined the ratio between the number of rows in the logs and the time of execution.
引用
收藏
页码:645 / 650
页数:6
相关论文
共 50 条
  • [1] Optimization of Multiple Queries for Big Data with Apache Hadoop/Hive
    Garg, Varun
    [J]. 2015 INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND COMMUNICATION NETWORKS (CICN), 2015, : 938 - 941
  • [2] Profiling Apache HIVE Query from Run Time Logs
    Haryono, Givanna Putri
    Zhou, Ying
    [J]. 2016 INTERNATIONAL CONFERENCE ON BIG DATA AND SMART COMPUTING (BIGCOMP), 2016, : 61 - 68
  • [3] Big Data Analysis using Apache Hadoop
    Manikandan, Shankar Ganesh
    Ravi, Siddarth
    [J]. 2014 INTERNATIONAL CONFERENCE ON IT CONVERGENCE AND SECURITY (ICITCS), 2014,
  • [4] Movie Dataset Analysis using Hadoop-Hive
    Ashwitha, T. A.
    Rodrigues, Anisha P.
    Chiplunkar, Niranjan N.
    [J]. 2017 2ND INTERNATIONAL CONFERENCE ON COMPUTATIONAL SYSTEMS AND INFORMATION TECHNOLOGY FOR SUSTAINABLE SOLUTION (CSITSS-2017), 2017, : 181 - 186
  • [5] Performance Analysis of ECG Big Data using Apache Hive and Apache Pig
    Ahmad, Mudassar
    Kanwal, Safina
    Cheema, Maryam
    Habib, Muhammad Asif
    [J]. 2019 8TH INTERNATIONAL CONFERENCE ON INFORMATION AND COMMUNICATION TECHNOLOGIES (ICICT 2019), 2019, : 2 - 7
  • [6] Comparison and Analysis of RDF Data Using SPARQL, HIVE, PIG in Hadoop
    Chandel, Anshul
    Garg, Deepak
    [J]. COMPUTING AND NETWORK SUSTAINABILITY, 2017, 12 : 361 - 369
  • [7] Hive - A Petabyte Scale Data Warehouse Using Hadoop
    Thusoo, Ashish
    Sen Sarma, Joydeep
    Jain, Namit
    Shao, Zheng
    Chakka, Prasad
    Zhang, Ning
    Antony, Suresh
    Liu, Hao
    Murthy, Raghotham
    [J]. 26TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING ICDE 2010, 2010, : 996 - 1005
  • [8] A comprehensive performance analysis of Apache Hadoop and Apache Spark for large scale data sets using HiBench
    Ahmed, N.
    Barczak, Andre L. C.
    Susnjak, Teo
    Rashid, Mohammed A.
    [J]. JOURNAL OF BIG DATA, 2020, 7 (01)
  • [9] A comprehensive performance analysis of Apache Hadoop and Apache Spark for large scale data sets using HiBench
    N. Ahmed
    Andre L. C. Barczak
    Teo Susnjak
    Mohammed A. Rashid
    [J]. Journal of Big Data, 7
  • [10] Performance Analysis of Scheduling Algorithms in Apache Hadoop
    Li, Yang
    [J]. 2020 16TH INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND SECURITY (CIS 2020), 2020, : 149 - 154