Data-Driven Test Selection at Scale

被引：1

作者：

Mehta, Sonu ^{[1
]}

Farmahinifarahani, Farima ^{[2
]}

Bhagwan, Ranjita ^{[1
]}

Guptha, Suraj ^{[3
]}

Jafari, Sina ^{[3
]}

Kumar, Rahul ^{[1
]}

Saini, Vaibhav ^{[3
]}

Santhiar, Anirudh ^{[3
]}

机构：

[1] Microsoft Res, Bangalore, Karnataka, India

[2] Univ Calif Irvine, Irvine, CA USA

[3] Microsoft Corp, Redmond, WA 98052 USA

来源：

PROCEEDINGS OF THE 29TH ACM JOINT MEETING ON EUROPEAN SOFTWARE ENGINEERING CONFERENCE AND SYMPOSIUM ON THE FOUNDATIONS OF SOFTWARE ENGINEERING (ESEC/FSE '21) | 2021年

关键词：

test selection; continuous integration; statistical models; REGRESSION TEST SELECTION;

D O I：

10.1145/3468264.3473916

中图分类号：

TP31 [计算机软件];

学科分类号：

081202 ; 0835 ;

摘要：

Large-scale services depend on Continuous Integration/Continuous Deployment (CI/CD) processes to maintain their agility and code-quality. Change-based testing plays an important role in finding bugs, but testing after every change is prohibitively expensive at a scale where thousands of changes are committed every hour. Test selection models deal with this issue by running a subset of tests for every change. In this paper, we present a generic, language-agnostic and light-weight statistical model for test selection. Unlike existing techniques, the proposed model does not require complex feature ex-traction techniques. Consequently, it scales to hundreds of repositories of varying characteristics while capturing more than 99% of buggy pull requests. Additionally, to better evaluate test selection models, we propose application-specific metrics that capture both a reduction in resource cost and a reduction in pull-request turn-around time. By evaluating our model on 22 large repositories at Microsoft, we find that we can save 15% - 30% of compute time while reporting back more than approximate to 99% of buggy pull requests.

引用

页码：1225 / 1235

页数：11

共 50 条

[1] Data-driven item selection for the Shirts and Shoes Test
Tucci, Alexander
Plante, Elena
Vance, Rebecca
Oglivie, Trianna
[J]. JOURNAL OF COMMUNICATION DISORDERS, 2019, 78 : 46 - 56
[2] The variable bandwidth mean shift and data-driven scale selection
Comaniciu, D
Ramesh, V
Meer, P
[J]. EIGHTH IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION, VOL I, PROCEEDINGS, 2001, : 438 - 445
[3] Data-driven smooth test for a location-scale family
Janic-Wróblewska, A
[J]. STATISTICS, 2004, 38 (04) : 337 - 355
[4] Data-driven Site Selection
Schuh, Günther
Gützlaff, Andreas
Adlon, Tobias
Schupp, Steffen
Endrikat, Morten
Schlosser, Tino X.
[J]. ZWF Zeitschrift fuer Wirtschaftlichen Fabrikbetrieb, 2022, 117 (05): : 258 - 263
[5] DATA-DRIVEN TEST SYSTEMS
LANDIS, AS
[J]. HEWLETT-PACKARD JOURNAL, 1994, 45 (04): : 62 - 66
[6] An algorithm for data-driven bandwidth selection
Comaniciu, D
[J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2003, 25 (02) : 281 - 288
[7] Data-driven Exemplar Model Selection
Misra, Ishan
Shrivastava, Abhinav
Hebert, Martial
[J]. 2014 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2014, : 339 - 346
[8] A data-driven smooth test of symmetry
Fang, Ying
Li, Qi
Wu, Ximing
Zhang, Daiqiang
[J]. JOURNAL OF ECONOMETRICS, 2015, 188 (02) : 490 - 501
[9] A data-driven test for dispersive ordering
Fan, YQ
[J]. STATISTICS & PROBABILITY LETTERS, 1999, 41 (04) : 331 - 336
[10] Data-Driven Lightweight Interest Point Selection for Large-Scale Visual Search
Gao, Feng
Zhang, Xinfeng
Huang, Yicheng
Luo, Yong
Li, Xiaoming
Duan, Ling-Yu
[J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2018, 20 (10) : 2774 - 2787

← 1 2 3 4 5 →