Automated Debugging in Data-Intensive Scalable Computing

被引:16
|
作者
Gulzar, Muhammad Ali [1 ]
Interlandi, Matteo [2 ]
Han, Xueyuan [3 ]
Li, Mingda [1 ]
Condie, Tyson [1 ]
Kim, Miryung [1 ]
机构
[1] Univ Calif Los Angeles, Los Angeles, CA 90095 USA
[2] Microsoft, Redmond, WA USA
[3] Harvard Univ, Cambridge, MA 02138 USA
关键词
Automated debugging; fault localization; data provenance; data-intensive scalable computing (DISC); big data; and data cleaning; PROVENANCE;
D O I
10.1145/3127479.3131624
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Developing Big Data Analytics workloads often involves trial and error debugging, due to the unclean nature of datasets or wrong assumptions made about data. When errors (e.g., program crash, outlier results, etc.) arise, developers are often interested in identifying a subset of the input data that is able to reproduce the problem. BIGSIFT is a new faulty data localization approach that combines insights from automated fault isolation in software engineering and data provenance in database systems to find a minimum set of failureinducing inputs. BIGSIFT redefines data provenance for the purpose of debugging using a test oracle function and implements several unique optimizations, specifically geared towards the iterative nature of automated debugging workloads. BIGSIFT improves the accuracy of fault localizability by several orders-of-magnitude (similar to 10(3) to 10(7) x) compared to Titian data provenance, and improves performance by up to 66 x compared to Delta Debugging, an automated fault-isolation technique. For each faulty output, BIGSIFT is able to localize fault-inducing data within 62% of the original job running time.
引用
收藏
页码:520 / 534
页数:15
相关论文
共 50 条
  • [41] Implementing scalable parallel search algorithms for data-intensive applications
    Ladányi, L
    Ralphs, TK
    Saltzman, MJ
    COMPUTATIONAL SCIENCE-ICCS 2002, PT I, PROCEEDINGS, 2002, 2329 : 592 - 602
  • [42] Rethinking Memory System Design for Data-Intensive Computing
    Mutlu, Onur
    Proceedings International Conference on Embedded Computer Systems - Architectures, Modeling and Simulation (SAMOS XV), 2015, : I - I
  • [43] Dynamic function placement for data-intensive cluster computing
    Amiri, K
    Petrou, D
    Ganger, GR
    Gibson, GA
    USENIX ASSOCIATION PROCEEDINGS OF THE 2000 USENIX ANNUAL TECHNICAL CONFERENCE, 2000, : 307 - 322
  • [44] Innovative methods and algorithms for advanced data-intensive computing
    Cuzzocrea, Alfredo
    FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2014, 37 : 60 - 63
  • [45] Nebula: Distributed Edge Cloud for Data-Intensive Computing
    Ryden, Mathew
    Oh, Kwangsung
    Chandra, Abhishek
    Weissman, Jon
    PROCEEDINGS OF THE 2014 INTERNATIONAL CONFERENCE ON COLLABORATION TECHNOLOGIES AND SYSTEMS (CTS), 2014, : 491 - 492
  • [46] Maintaining Consistency in Data-Intensive Cloud Computing Environment
    Basu, Sruti
    Pattnaik, Prasant Kumar
    PROGRESS IN COMPUTING, ANALYTICS AND NETWORKING, ICCAN 2017, 2018, 710 : 257 - 264
  • [47] The Quest for Scalable Support of Data-Intensive Workloads in Distributed Systems
    Raicu, Ioan
    Foster, Ian T.
    Zhao, Yong
    Little, Philip
    Moretti, Christopher M.
    Chaudhary, Amitabh
    Thain, Douglas
    HPDC'09: 18TH ACM INTERNATIONAL SYMPOSIUM ON HIGH PERFORMANCE DISTRIBUTED COMPUTING, 2009, : 207 - 216
  • [48] Fair Resource Allocation for Data-Intensive Computing in the Cloud
    Tang, Shanjiang
    Lee, Bu-Sung
    He, Bingsheng
    IEEE TRANSACTIONS ON SERVICES COMPUTING, 2018, 11 (01) : 20 - 33
  • [49] Scalable Programming and Algorithms for Data-Intensive Life Science Applications
    Qiu, Judy
    OMICS-A JOURNAL OF INTEGRATIVE BIOLOGY, 2011, 15 (04) : 235 - 237
  • [50] Rethinking Data-Intensive Science Using Scalable Analytics Systems
    Nothaft, Frank Austin
    Massie, Matt
    Danford, Timothy
    Zhang, Zhao
    Laserson, Uri
    Yeksigian, Carl
    Kottalam, Jey
    Ahuja, Arun
    Hammerbacher, Jeff
    Linderman, Michael
    Franklin, Michael J.
    Joseph, Anthony D.
    Patterson, David A.
    SIGMOD'15: PROCEEDINGS OF THE 2015 ACM SIGMOD INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 2015, : 631 - 646