Data-Driven Test Selection at Scale

被引:1
|
作者
Mehta, Sonu [1 ]
Farmahinifarahani, Farima [2 ]
Bhagwan, Ranjita [1 ]
Guptha, Suraj [3 ]
Jafari, Sina [3 ]
Kumar, Rahul [1 ]
Saini, Vaibhav [3 ]
Santhiar, Anirudh [3 ]
机构
[1] Microsoft Res, Bangalore, Karnataka, India
[2] Univ Calif Irvine, Irvine, CA USA
[3] Microsoft Corp, Redmond, WA 98052 USA
关键词
test selection; continuous integration; statistical models; REGRESSION TEST SELECTION;
D O I
10.1145/3468264.3473916
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Large-scale services depend on Continuous Integration/Continuous Deployment (CI/CD) processes to maintain their agility and code-quality. Change-based testing plays an important role in finding bugs, but testing after every change is prohibitively expensive at a scale where thousands of changes are committed every hour. Test selection models deal with this issue by running a subset of tests for every change. In this paper, we present a generic, language-agnostic and light-weight statistical model for test selection. Unlike existing techniques, the proposed model does not require complex feature ex-traction techniques. Consequently, it scales to hundreds of repositories of varying characteristics while capturing more than 99% of buggy pull requests. Additionally, to better evaluate test selection models, we propose application-specific metrics that capture both a reduction in resource cost and a reduction in pull-request turn-around time. By evaluating our model on 22 large repositories at Microsoft, we find that we can save 15% - 30% of compute time while reporting back more than approximate to 99% of buggy pull requests.
引用
收藏
页码:1225 / 1235
页数:11
相关论文
共 50 条
  • [31] Data-Driven Bandwidth Selection for Nonstationary Semiparametric Models
    Sun, Yiguo
    Li, Qi
    [J]. JOURNAL OF BUSINESS & ECONOMIC STATISTICS, 2011, 29 (04) : 541 - 551
  • [32] Data-Driven Test Plan Augmentation for Platform Verification
    Chen, Wen
    Bhadra, Jayanta
    Hsieh, Kuo-Kai
    Wang, Li-Chung
    [J]. IEEE DESIGN & TEST, 2017, 34 (05) : 23 - 29
  • [33] Data-driven local bandwidth selection for additive models with missing data
    Raya-Miranda, R.
    Martinez-Miranda, M. D.
    [J]. APPLIED MATHEMATICS AND COMPUTATION, 2011, 217 (24) : 10328 - 10342
  • [34] A simple portmanteau test with data-driven truncation point
    Baragona, Roberto
    Battaglia, Francesco
    Cucina, Domenico
    [J]. COMPUTATIONAL STATISTICS, 2024, 39 (02) : 733 - 749
  • [35] DATA-DRIVEN VERSIONS OF PEARSONS CHISQUARE TEST FOR UNIFORMITY
    BOGDAN, M
    [J]. JOURNAL OF STATISTICAL COMPUTATION AND SIMULATION, 1995, 52 (03) : 217 - 237
  • [36] DATA-DRIVEN VERSION OF NEYMANS SMOOTH TEST OF FIT
    LEDWINA, T
    [J]. JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 1994, 89 (427) : 1000 - 1005
  • [37] Data-Driven Neuron Allocation for Scale Aggregation Networks
    Li, Yi
    Kuang, Zhanghui
    Chen, Yimin
    Zhang, Wayne
    [J]. 2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 11518 - 11526
  • [38] A simple portmanteau test with data-driven truncation point
    Roberto Baragona
    Francesco Battaglia
    Domenico Cucina
    [J]. Computational Statistics, 2024, 39 : 733 - 749
  • [39] Data-Driven Force Control of an Automated Scratch Test
    Diepers, Florian
    Polke, Dominik
    Ahle, Elmar
    Soeffker, Dirk
    [J]. 2022 10TH INTERNATIONAL CONFERENCE ON CONTROL, MECHATRONICS AND AUTOMATION (ICCMA 2022), 2022, : 94 - 99
  • [40] Data-driven approaches to phonological acquisition: An empirical test
    Gillis, S
    Durieux, G
    [J]. PROCEEDINGS OF THE UBC INTERNATIONAL CONFERENCE ON PHONOLOGICAL ACQUISTION, 1996, : 277 - 292