Data-Driven Test Selection at Scale

被引:1
|
作者
Mehta, Sonu [1 ]
Farmahinifarahani, Farima [2 ]
Bhagwan, Ranjita [1 ]
Guptha, Suraj [3 ]
Jafari, Sina [3 ]
Kumar, Rahul [1 ]
Saini, Vaibhav [3 ]
Santhiar, Anirudh [3 ]
机构
[1] Microsoft Res, Bangalore, Karnataka, India
[2] Univ Calif Irvine, Irvine, CA USA
[3] Microsoft Corp, Redmond, WA 98052 USA
关键词
test selection; continuous integration; statistical models; REGRESSION TEST SELECTION;
D O I
10.1145/3468264.3473916
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Large-scale services depend on Continuous Integration/Continuous Deployment (CI/CD) processes to maintain their agility and code-quality. Change-based testing plays an important role in finding bugs, but testing after every change is prohibitively expensive at a scale where thousands of changes are committed every hour. Test selection models deal with this issue by running a subset of tests for every change. In this paper, we present a generic, language-agnostic and light-weight statistical model for test selection. Unlike existing techniques, the proposed model does not require complex feature ex-traction techniques. Consequently, it scales to hundreds of repositories of varying characteristics while capturing more than 99% of buggy pull requests. Additionally, to better evaluate test selection models, we propose application-specific metrics that capture both a reduction in resource cost and a reduction in pull-request turn-around time. By evaluating our model on 22 large repositories at Microsoft, we find that we can save 15% - 30% of compute time while reporting back more than approximate to 99% of buggy pull requests.
引用
收藏
页码:1225 / 1235
页数:11
相关论文
共 50 条
  • [41] Data-driven approaches to phonological acquisition: An empirical test
    Gillis, S
    Durieux, G
    [J]. PROCEEDINGS OF THE UBC INTERNATIONAL CONFERENCE ON PHONOLOGICAL ACQUISTION, 1996, : 277 - 292
  • [42] Large Scale Data-Driven Evaluation in Computer Vision
    Spampinato, Concetto
    Boom, Bas
    Huet, Benoit
    [J]. COMPUTER VISION AND IMAGE UNDERSTANDING, 2015, 131 : III - IV
  • [43] Data-driven Authoring of Large-scale Ecosystems
    Kapp, Konrad
    Gain, James
    Guerin, Eric
    Galin, Eric
    Peytavie, Adrien
    [J]. ACM TRANSACTIONS ON GRAPHICS, 2020, 39 (06):
  • [44] Data-Driven Optimal Test Selection Design for Fault Detection and Isolation Based on CCVKL Method and PSO
    Li, Yang
    Chen, Hongtian
    Lu, Ningyun
    Jiang, Bin
    Zio, Enrico
    [J]. IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, 2022, 71
  • [45] Data-Driven Sensor Selection using Gumbel-max Sampling for Large-Scale IoT
    Chen, Yuxuan
    Chen, Yuan
    Li, Guobing
    [J]. 2023 IEEE 97TH VEHICULAR TECHNOLOGY CONFERENCE, VTC2023-SPRING, 2023,
  • [46] When Data-Driven Decision Making Becomes Data-Driven Test Taking: A Case Study of a Midwestern High School
    Roegman, Rachel
    Kenney, Rachael
    Maeda, Yukiko
    Johns, Gary
    [J]. EDUCATIONAL POLICY, 2021, 35 (04) : 535 - 565
  • [47] DATA-DRIVEN
    Lev-Ram, Michal
    [J]. FORTUNE, 2016, 174 (05) : 76 - 81
  • [48] Selection Criteria for the Analysis of Data-Driven Clusters in Cerebral fMRI
    Gomez-Laberge, Camille
    Adler, Andy
    Cameron, Ian
    Nguyen, Thanh Binh
    Hogan, Matthew J.
    [J]. IEEE TRANSACTIONS ON BIOMEDICAL ENGINEERING, 2008, 55 (10) : 2372 - 2380
  • [49] Data-driven selection of the spline dimension in penalized spline regression
    Kauermann, Goeran
    Opsomer, Jean D.
    [J]. BIOMETRIKA, 2011, 98 (01) : 225 - 230
  • [50] Data-Driven Selection of Tessellation Models Describing Polycrystalline Microstructures
    Ondřej Šedivý
    Daniel Westhoff
    Jaromír Kopeček
    Carl E. Krill III
    Volker Schmidt
    [J]. Journal of Statistical Physics, 2018, 172 : 1223 - 1246