Automated Debugging in Data-Intensive Scalable Computing

被引:16
|
作者
Gulzar, Muhammad Ali [1 ]
Interlandi, Matteo [2 ]
Han, Xueyuan [3 ]
Li, Mingda [1 ]
Condie, Tyson [1 ]
Kim, Miryung [1 ]
机构
[1] Univ Calif Los Angeles, Los Angeles, CA 90095 USA
[2] Microsoft, Redmond, WA USA
[3] Harvard Univ, Cambridge, MA 02138 USA
关键词
Automated debugging; fault localization; data provenance; data-intensive scalable computing (DISC); big data; and data cleaning; PROVENANCE;
D O I
10.1145/3127479.3131624
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Developing Big Data Analytics workloads often involves trial and error debugging, due to the unclean nature of datasets or wrong assumptions made about data. When errors (e.g., program crash, outlier results, etc.) arise, developers are often interested in identifying a subset of the input data that is able to reproduce the problem. BIGSIFT is a new faulty data localization approach that combines insights from automated fault isolation in software engineering and data provenance in database systems to find a minimum set of failureinducing inputs. BIGSIFT redefines data provenance for the purpose of debugging using a test oracle function and implements several unique optimizations, specifically geared towards the iterative nature of automated debugging workloads. BIGSIFT improves the accuracy of fault localizability by several orders-of-magnitude (similar to 10(3) to 10(7) x) compared to Titian data provenance, and improves performance by up to 66 x compared to Delta Debugging, an automated fault-isolation technique. For each faulty output, BIGSIFT is able to localize fault-inducing data within 62% of the original job running time.
引用
收藏
页码:520 / 534
页数:15
相关论文
共 50 条
  • [1] BigSift: Automated Debugging of Big Data Analytics in Data-Intensive Scalable Computing
    Gulzar, Muhammad Ali
    Wang, Siman
    Kim, Miryung
    ESEC/FSE'18: PROCEEDINGS OF THE 2018 26TH ACM JOINT MEETING ON EUROPEAN SOFTWARE ENGINEERING CONFERENCE AND SYMPOSIUM ON THE FOUNDATIONS OF SOFTWARE ENGINEERING, 2018, : 863 - 866
  • [2] Data-intensive workflow management: For clouds and data-intensive and scalable computing environments
    De Oliveira, Daniel C.M.
    Liu, Ji
    Pacitti, Esther
    Synthesis Lectures on Data Management, 2019, 14 (04): : 1 - 179
  • [3] Data-Intensive Scalable Computing for Scientific Applications
    Bryant, Randal E.
    COMPUTING IN SCIENCE & ENGINEERING, 2011, 13 (06) : 25 - 33
  • [4] Special Issue on Data-Intensive Scalable Computing Systems
    Roth, Philip C.
    Canon, R. Shane
    PARALLEL COMPUTING, 2017, 61 : 1 - 2
  • [5] Scalable Data-Intensive Analytics
    Hsu, Meichun
    Chen, Qiming
    BUSINESS INTELLIGENCE FOR THE REAL-TIME ENTERPRISE, 2009, 27 : 97 - +
  • [6] Scalable Pointer-based Memory Protection for Data-intensive Computing
    An, Baik Song
    11TH INTERNATIONAL CONFERENCE ON ICT CONVERGENCE: DATA, NETWORK, AND AI IN THE AGE OF UNTACT (ICTC 2020), 2020, : 1602 - 1604
  • [7] Applications in Data-Intensive Computing
    Shah, Anuj R.
    Adkins, Joshua N.
    Baxter, Douglas J.
    Cannon, William R.
    Chavarria-Miranda, Daniel G.
    Choudhury, Sutanay
    Gorton, Ian
    Gracio, Deborah K.
    Halter, Todd D.
    Jaitly, Navdeep D.
    Johnson, John R.
    Kouzes, Richard T.
    Macduff, Matthew C.
    Marquez, Andres
    Monroe, Matthew E.
    Oehmen, Christopher S.
    Pike, William A.
    Scherrer, Chad
    Villa, Oreste
    Webb-Robertson, Bobbie-Jo
    Whitney, Paul D.
    Zuljevic, Nino
    ADVANCES IN COMPUTERS, VOL 79, 2010, 79 : 1 - 70
  • [8] Data-intensive computing and digital libraries
    Moore, R
    Prince, TA
    Ellisman, M
    COMMUNICATIONS OF THE ACM, 1998, 41 (11) : 56 - 62
  • [9] Technology Prospects for Data-Intensive Computing
    Akarvardar, Kerem
    Wong, H-S Philip
    PROCEEDINGS OF THE IEEE, 2023, 111 (01) : 92 - 112
  • [10] Support for data-intensive computing with CloudMan
    Kowsar, Y.
    Afgan, E.
    2013 36TH INTERNATIONAL CONVENTION ON INFORMATION AND COMMUNICATION TECHNOLOGY, ELECTRONICS AND MICROELECTRONICS (MIPRO), 2013, : 243 - 248