Hive-Based Anomaly Detection in Hadoop Log Data Management

被引:0
|
作者
Son, Siwoon [1 ]
Gil, Myeong-Seon [1 ]
Yang, Seokwoo [1 ]
Moon, Yang-Sae [1 ]
机构
[1] Kangwon Natl Univ, Dept Comp Sci, Chunchon, South Korea
关键词
Anomaly detection; Big data; Log data; Apache hadoop; Apache Hive; Moving average; 3-Sigma;
D O I
10.1007/978-981-10-3023-9_129
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we address how to manage and analyze a large volume of log data, which have been difficult to be handled in the traditional computing environment. To handle a large volume of Hadoop log data, which rapidly occur in multiple servers, we present new data storage architecture to efficiently analyze those big log data through Apache Hive. We then design and implement a simple but efficient anomaly detection method, which identifies abnormal status of servers from log data, based on moving average and 3-sigma techniques. We also show effectiveness of the proposed detection method by demonstrating that it properly detects anomalies from Hadoop log data.
引用
收藏
页码:837 / 842
页数:6
相关论文
共 50 条
  • [1] Efficient Big Data Modelling and Organization for Hadoop Hive-Based Data Warehouses
    Costa, Eduarda
    Costa, Carlos
    Santos, Maribel Yasmina
    [J]. INFORMATION SYSTEMS, EMCIS 2017, 2017, 299 : 3 - 16
  • [2] Anomaly Detection for Big Log Data Using a Hadoop Ecosystem
    Son, Siwoon
    Gil, Myeong-Seon
    Moon, Yang-Sae
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON BIG DATA AND SMART COMPUTING (BIGCOMP), 2017, : 377 - 380
  • [3] Robust Log-Based Anomaly Detection on Unstable Log Data
    Zhang, Xu
    Xu, Yong
    Lin, Qingwei
    Qiao, Bo
    Zhang, Hongyu
    Dang, Yingnong
    Xie, Chunyu
    Yang, Xinsheng
    Cheng, Qian
    Li, Ze
    Chen, Junjie
    He, Xiaoting
    Yao, Randolph
    Lou, Jian-Guang
    Chintalapati, Murali
    Shen, Furao
    Zhang, Dongmei
    [J]. ESEC/FSE'2019: PROCEEDINGS OF THE 2019 27TH ACM JOINT MEETING ON EUROPEAN SOFTWARE ENGINEERING CONFERENCE AND SYMPOSIUM ON THE FOUNDATIONS OF SOFTWARE ENGINEERING, 2019, : 807 - 817
  • [4] Evaluating partitioning and bucketing strategies for Hive-based Big Data Warehousing systems
    Costa, Eduarda
    Costa, Carlos
    Santos, Maribel Yasmina
    [J]. JOURNAL OF BIG DATA, 2019, 6 (01)
  • [5] Importance of Data Distribution on Hive-based Systems for Query Performance: An Experimental Study
    Ciritoglu, Hilmi Egemen
    Murphy, John
    Thorpe, Christina
    [J]. 2020 IEEE INTERNATIONAL CONFERENCE ON BIG DATA AND SMART COMPUTING (BIGCOMP 2020), 2020, : 370 - 376
  • [6] Evaluating partitioning and bucketing strategies for Hive-based Big Data Warehousing systems
    Eduarda Costa
    Carlos Costa
    Maribel Yasmina Santos
    [J]. Journal of Big Data, 6
  • [7] The Impact of Communication and Memory in Hive-based Foraging Agents
    Schermerhorn, Paul
    Scheutz, Matthias
    [J]. 2009 IEEE SYMPOSIUM ON ARTIFICIAL LIFE, 2009, : 29 - 36
  • [8] Network Alarm Information Analysis - A Hive-based OLAP Method
    Zhang, Dengyin
    Zhang, Liu
    [J]. 2015 IEEE INTERNATIONAL CONFERENCE ON DIGITAL SIGNAL PROCESSING (DSP), 2015, : 694 - 697
  • [9] The effect of environmental structure on the utility of communication in hive-based swarms
    Schermerhorn, P
    Scheutz, M
    [J]. 2005 IEEE SWARM INTELLIGENCE SYMPOSIUM, 2005, : 440 - 443
  • [10] Log anomaly detection based on BERT
    Tang, Pan
    Guan, Yepeng
    [J]. SIGNAL IMAGE AND VIDEO PROCESSING, 2024, 18 (8-9) : 6431 - 6441