Data-Driven Test Selection at Scale

被引:1
|
作者
Mehta, Sonu [1 ]
Farmahinifarahani, Farima [2 ]
Bhagwan, Ranjita [1 ]
Guptha, Suraj [3 ]
Jafari, Sina [3 ]
Kumar, Rahul [1 ]
Saini, Vaibhav [3 ]
Santhiar, Anirudh [3 ]
机构
[1] Microsoft Res, Bangalore, Karnataka, India
[2] Univ Calif Irvine, Irvine, CA USA
[3] Microsoft Corp, Redmond, WA 98052 USA
关键词
test selection; continuous integration; statistical models; REGRESSION TEST SELECTION;
D O I
10.1145/3468264.3473916
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Large-scale services depend on Continuous Integration/Continuous Deployment (CI/CD) processes to maintain their agility and code-quality. Change-based testing plays an important role in finding bugs, but testing after every change is prohibitively expensive at a scale where thousands of changes are committed every hour. Test selection models deal with this issue by running a subset of tests for every change. In this paper, we present a generic, language-agnostic and light-weight statistical model for test selection. Unlike existing techniques, the proposed model does not require complex feature ex-traction techniques. Consequently, it scales to hundreds of repositories of varying characteristics while capturing more than 99% of buggy pull requests. Additionally, to better evaluate test selection models, we propose application-specific metrics that capture both a reduction in resource cost and a reduction in pull-request turn-around time. By evaluating our model on 22 large repositories at Microsoft, we find that we can save 15% - 30% of compute time while reporting back more than approximate to 99% of buggy pull requests.
引用
收藏
页码:1225 / 1235
页数:11
相关论文
共 50 条
  • [1] Data-driven item selection for the Shirts and Shoes Test
    Tucci, Alexander
    Plante, Elena
    Vance, Rebecca
    Oglivie, Trianna
    [J]. JOURNAL OF COMMUNICATION DISORDERS, 2019, 78 : 46 - 56
  • [2] The variable bandwidth mean shift and data-driven scale selection
    Comaniciu, D
    Ramesh, V
    Meer, P
    [J]. EIGHTH IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION, VOL I, PROCEEDINGS, 2001, : 438 - 445
  • [3] Data-driven smooth test for a location-scale family
    Janic-Wróblewska, A
    [J]. STATISTICS, 2004, 38 (04) : 337 - 355
  • [4] Data-driven Site Selection
    Schuh, Günther
    Gützlaff, Andreas
    Adlon, Tobias
    Schupp, Steffen
    Endrikat, Morten
    Schlosser, Tino X.
    [J]. ZWF Zeitschrift fuer Wirtschaftlichen Fabrikbetrieb, 2022, 117 (05): : 258 - 263
  • [5] DATA-DRIVEN TEST SYSTEMS
    LANDIS, AS
    [J]. HEWLETT-PACKARD JOURNAL, 1994, 45 (04): : 62 - 66
  • [6] An algorithm for data-driven bandwidth selection
    Comaniciu, D
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2003, 25 (02) : 281 - 288
  • [7] Data-driven Exemplar Model Selection
    Misra, Ishan
    Shrivastava, Abhinav
    Hebert, Martial
    [J]. 2014 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2014, : 339 - 346
  • [8] A data-driven smooth test of symmetry
    Fang, Ying
    Li, Qi
    Wu, Ximing
    Zhang, Daiqiang
    [J]. JOURNAL OF ECONOMETRICS, 2015, 188 (02) : 490 - 501
  • [9] A data-driven test for dispersive ordering
    Fan, YQ
    [J]. STATISTICS & PROBABILITY LETTERS, 1999, 41 (04) : 331 - 336
  • [10] Data-Driven Lightweight Interest Point Selection for Large-Scale Visual Search
    Gao, Feng
    Zhang, Xinfeng
    Huang, Yicheng
    Luo, Yong
    Li, Xiaoming
    Duan, Ling-Yu
    [J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2018, 20 (10) : 2774 - 2787