PerfCompass: Online Performance Anomaly Fault Localization and Inference in Infrastructure-as-a-Service Clouds

被引:24
|
作者
Dean, Daniel J. [1 ]
Nguyen, Hiep [1 ]
Wang, Peipei [1 ]
Gu, Xiaohui [1 ]
Sailer, Anca [2 ]
Kochut, Andrzej [2 ]
机构
[1] N Carolina State Univ, Dept Comp Sci, Raleigh, NC 27603 USA
[2] IBM Corp, TJ Watson Res Ctr, Yorktown Hts, NY USA
关键词
Reliability; availability; and serviceability; debugging aids; distributed debugging; performance;
D O I
10.1109/TPDS.2015.2444392
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Infrastructure-as-a-service clouds are becoming widely adopted. However, resource sharing and multi-tenancy have made performance anomalies a top concern for users. Timely debugging those anomalies is paramount for minimizing the performance penalty for users. Unfortunately, this debugging often takes a long time due to the inherent complexity and sharing nature of cloud infrastructures. When an application experiences a performance anomaly, it is important to distinguish between faults with a global impact and faults with a local impact as the diagnosis and recovery steps for faults with a global impact or local impact are quite different. In this paper, we present PerfCompass, an online performance anomaly fault debugging tool that can quantify whether a production-run performance anomaly has a global impact or local impact. PerfCompass can use this information to suggest the root cause as either an external fault (e.g., environment-based) or an internal fault (e.g., software bugs). Furthermore, PerfCompass can identify top affected systemcalls to provide useful diagnostic hints for detailed performance debugging. PerfCompass does not require source code or runtime application instrumentation, which makes it practical for production systems. We have tested PerfCompass by running five common open source systems (e.g., Apache, MySQL, Tomcat, Hadoop, Cassandra) inside a virtualized cloud testbed. Our experiments use a range of common infrastructure sharing issues and real software bugs. The results show that PerfCompass accurately classifies 23 out of the 24 tested cases without calibration and achieves 100 percent accuracy with calibration. PerfCompass provides useful diagnosis hints within several minutes and imposes negligible runtime overhead to the production system during normal execution time.
引用
收藏
页码:1742 / 1755
页数:14
相关论文
共 30 条
  • [1] On dynamic performance estimation of fault-prone Infrastructure-as-a-Service clouds
    Zheng, Wanbo
    Wang, Yuandou
    Xia, Yunni
    Wu, Quanwang
    Wu, Lei
    Guo, Kunyin
    Li, Weiling
    Luo, Xin
    Zhu, Qingsheng
    [J]. INTERNATIONAL JOURNAL OF DISTRIBUTED SENSOR NETWORKS, 2017, 13 (07):
  • [2] Performance Benchmarking of Infrastructure-as-a-Service (IaaS) Clouds with Cloud WorkBench
    Scheuner, Joel
    Leitner, Philipp
    [J]. COMPANION OF THE 2019 ACM/SPEC INTERNATIONAL CONFERENCE ON PERFORMANCE ENGINEERING (ICPE '19), 2019, : 53 - 56
  • [3] Performance Benchmarking of Infrastructure-as-a-Service (IaaS) Clouds with Cloud WorkBench
    Scheuner, Joel
    Leitner, Philipp
    [J]. 2019 IEEE 4TH INTERNATIONAL WORKSHOPS ON FOUNDATIONS AND APPLICATIONS OF SELF* SYSTEMS (FAS*W 2019), 2019, : 257 - 258
  • [4] Percentile Performance Analysis of Infrastructure-as-a-Service clouds with task retrials
    Zhu, Li
    Wang, Yuandou
    Zheng, Wanbo
    Wu, Lei
    Yuan, Ye
    Chen, Peng
    Xia, Yunni
    [J]. PROCEEDINGS OF THE 2017 IEEE 14TH INTERNATIONAL CONFERENCE ON NETWORKING, SENSING AND CONTROL (ICNSC 2017), 2017, : 270 - 274
  • [5] Cloud Crawler: a declarative performance evaluation environment for infrastructure-as-a-service clouds
    Cunha, M.
    Mendonca, N. C.
    Sampaio, A.
    [J]. CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2017, 29 (01):
  • [6] Stochastic Modeling and Quality Evaluation of Infrastructure-as-a-Service Clouds
    Xia, Yunni
    Zhou, MengChu
    Luo, Xin
    Zhu, Qingsheng
    Li, Jia
    Huang, Yu
    [J]. IEEE TRANSACTIONS ON AUTOMATION SCIENCE AND ENGINEERING, 2015, 12 (01) : 162 - 170
  • [7] Performance Modeling to Support Multi-Tier Application Deployment to Infrastructure-as-a-Service Clouds
    Lloyd, Wes
    Pallickara, Shrideep
    David, Olaf
    Lyon, Jim
    Arabi, Mazdak
    Rojas, Ken
    [J]. 2012 IEEE/ACM FIFTH INTERNATIONAL CONFERENCE ON UTILITY AND CLOUD COMPUTING (UCC 2012), 2012, : 73 - 80
  • [8] A Protocol for Preventing Insider Attacks in Untrusted Infrastructure-as-a-Service Clouds
    Khan, Imran
    Anwar, Zahid
    Bordbar, Behzad
    Ritter, Eike
    Rehman, Habib-ur
    [J]. IEEE TRANSACTIONS ON CLOUD COMPUTING, 2018, 6 (04) : 942 - 954
  • [9] Securing Infrastructure-as-a-Service Public Clouds Using Security Onion
    Mikail, Abdullahi
    Pranggono, Bernardi
    [J]. APPLIED SYSTEM INNOVATION, 2019, 2 (01) : 1 - 17
  • [10] Performance implications of multi-tier application deployments on Infrastructure-as-a-Service clouds: Towards performance modeling
    Lloyd, W.
    Pallickara, S.
    David, O.
    Lyon, J.
    Arabi, M.
    Rojas, K.
    [J]. FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2013, 29 (05): : 1254 - 1264