Connecting the dots: anomaly and discontinuity detection in large-scale systems

被引:3
|
作者
Malik, Haroon [1 ]
Davis, Ian J. [2 ]
Godfrey, Michael W. [2 ]
Neuse, Douglas [3 ]
Manskovskii, Serge [3 ]
机构
[1] Marshall Univ, Weisberg Div Comp Sci, Huntington, WV 25755 USA
[2] Univ Waterloo, David R Cheriton Sch Comp, Waterloo, ON, Canada
[3] CA Technol, CA Labs, Redwood City, CA USA
基金
加拿大自然科学与工程研究理事会;
关键词
Forecast; Datacentre; Anomaly; Discontinuity; LOAD TESTS;
D O I
10.1007/s12652-016-0381-4
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Cloud providers and data centers rely heavily on forecasts to accurately predict future workload. This information helps them in appropriate virtualization and cost-effective provisioning of the infrastructure. The accuracy of a forecast greatly depends upon the merit of performance data fed to the underlying algorithms. One of the fundamental problems faced by analysts in preparing data for use in forecasting is the timely identification of data discontinuities. A discontinuity is an abrupt change in a time-series pattern of a performance counter that persists but does not recur. Analysts need to identify discontinuities in performance data so that they can (a) remove the discontinuities from the data before building a forecast model and (b) retrain an existing forecast model on the performance data from the point in time where a discontinuity occurred. There exist several approaches and tools to help analysts identify anomalies in performance data. However, there exists no automated approach to assist data center operators in detecting discontinuities. In this paper, we present and evaluate our proposed approach to help data center analysts and cloud providers automatically detect discontinuities. A case study on the performance data obtained from a large cloud provider and performance tests conducted using an open source benchmark system show that our proposed approach provides on average precision of 84 % and recall 88 %. The approach does not require any domain knowledge to operate.
引用
收藏
页码:509 / 522
页数:14
相关论文
共 50 条
  • [1] Connecting the dots: anomaly and discontinuity detection in large-scale systems
    Haroon Malik
    Ian J. Davis
    Michael W. Godfrey
    Douglas Neuse
    Serge Manskovskii
    [J]. Journal of Ambient Intelligence and Humanized Computing, 2016, 7 : 509 - 522
  • [2] Hierarchical Anomaly Detection and Multimodal Classification in Large-Scale Photovoltaic Systems
    Zhao, Yingying
    Liu, Qi
    Li, Dongsheng
    Kang, Dahai
    Lv, Qin
    Shang, Li
    [J]. IEEE TRANSACTIONS ON SUSTAINABLE ENERGY, 2019, 10 (03) : 1351 - 1361
  • [3] Efficient and Robust Trace Anomaly Detection for Large-Scale Microservice Systems
    Zhang, Shenglin
    Pan, Zhongjie
    Liu, Heng
    Jin, Pengxiang
    Sun, Yongqian
    Ouyang, Qianyu
    Wang, Jiaju
    Jia, Xueying
    Zhang, Yuzhi
    Yang, Hui
    Zou, Yongqiang
    Pei, Dan
    [J]. 2023 IEEE 34TH INTERNATIONAL SYMPOSIUM ON SOFTWARE RELIABILITY ENGINEERING, ISSRE, 2023, : 69 - 79
  • [4] Privatized Distributed Anomaly Detection for Large-Scale Nonlinear Uncertain Systems
    Rostampour, Vahab
    Ferrari, Riccardo M. G.
    Teixeira, Andre M. H.
    Keviczky, Tamas
    [J]. IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2021, 66 (11) : 5299 - 5313
  • [5] Anomaly Detection in a Large-scale Cloud Platform
    Islam, Mohammad S.
    Pourmajidi, William
    Zhang, Lei
    Steinbacher, John
    Erwin, Tony
    Miranskyy, Andriy
    [J]. 2021 IEEE/ACM 43RD INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING: SOFTWARE ENGINEERING IN PRACTICE (ICSE-SEIP 2021), 2021, : 150 - 159
  • [6] Execution anomaly detection in large-scale systems through console log analysis
    Bao, Liang
    Li, Qian
    Lu, Peiyao
    Lu, Jie
    Ruan, Tongxiao
    Zhang, Ke
    [J]. JOURNAL OF SYSTEMS AND SOFTWARE, 2018, 143 : 172 - 186
  • [7] Crowdsourcing based large-scale network anomaly detection
    Li, Yang
    Huang, Wenguang
    Tian, Xiaohua
    [J]. 2018 10TH INTERNATIONAL CONFERENCE ON WIRELESS COMMUNICATIONS AND SIGNAL PROCESSING (WCSP), 2018,
  • [8] Robust Anomaly Detection for Large-Scale Sensor Data
    Chakrabarti, Aniket
    Marwah, Manish
    Arlitt, Martin
    [J]. BUILDSYS'16: PROCEEDINGS OF THE 3RD ACM CONFERENCE ON SYSTEMS FOR ENERGY-EFFCIENT BUILT ENVIRONMENTS, 2016, : 31 - 40
  • [9] Anomaly detection in large-scale data stream networks
    Duc-Son Pham
    Venkatesh, Svetha
    Lazarescu, Mihai
    Budhaditya, Saha
    [J]. DATA MINING AND KNOWLEDGE DISCOVERY, 2014, 28 (01) : 145 - 189
  • [10] Anomaly detection in large-scale data stream networks
    Duc-Son Pham
    Svetha Venkatesh
    Mihai Lazarescu
    Saha Budhaditya
    [J]. Data Mining and Knowledge Discovery, 2014, 28 : 145 - 189