Data-Driven Test Selection at Scale

被引：1

作者：

Mehta, Sonu ^{[1
]}

Farmahinifarahani, Farima ^{[2
]}

Bhagwan, Ranjita ^{[1
]}

Guptha, Suraj ^{[3
]}

Jafari, Sina ^{[3
]}

Kumar, Rahul ^{[1
]}

Saini, Vaibhav ^{[3
]}

Santhiar, Anirudh ^{[3
]}

机构：

[1] Microsoft Res, Bangalore, Karnataka, India

[2] Univ Calif Irvine, Irvine, CA USA

[3] Microsoft Corp, Redmond, WA 98052 USA

来源：

PROCEEDINGS OF THE 29TH ACM JOINT MEETING ON EUROPEAN SOFTWARE ENGINEERING CONFERENCE AND SYMPOSIUM ON THE FOUNDATIONS OF SOFTWARE ENGINEERING (ESEC/FSE '21) | 2021年

关键词：

test selection; continuous integration; statistical models; REGRESSION TEST SELECTION;

D O I：

10.1145/3468264.3473916

中图分类号：

TP31 [计算机软件];

学科分类号：

081202 ; 0835 ;

摘要：

Large-scale services depend on Continuous Integration/Continuous Deployment (CI/CD) processes to maintain their agility and code-quality. Change-based testing plays an important role in finding bugs, but testing after every change is prohibitively expensive at a scale where thousands of changes are committed every hour. Test selection models deal with this issue by running a subset of tests for every change. In this paper, we present a generic, language-agnostic and light-weight statistical model for test selection. Unlike existing techniques, the proposed model does not require complex feature ex-traction techniques. Consequently, it scales to hundreds of repositories of varying characteristics while capturing more than 99% of buggy pull requests. Additionally, to better evaluate test selection models, we propose application-specific metrics that capture both a reduction in resource cost and a reduction in pull-request turn-around time. By evaluating our model on 22 large repositories at Microsoft, we find that we can save 15% - 30% of compute time while reporting back more than approximate to 99% of buggy pull requests.

引用

页码：1225 / 1235

页数：11

共 50 条

[31] Data-Driven Bandwidth Selection for Nonstationary Semiparametric Models
Sun, Yiguo
Li, Qi
[J]. JOURNAL OF BUSINESS & ECONOMIC STATISTICS, 2011, 29 (04) : 541 - 551
[32] Data-Driven Test Plan Augmentation for Platform Verification
Chen, Wen
Bhadra, Jayanta
Hsieh, Kuo-Kai
Wang, Li-Chung
[J]. IEEE DESIGN & TEST, 2017, 34 (05) : 23 - 29
[33] Data-driven local bandwidth selection for additive models with missing data
Raya-Miranda, R.
Martinez-Miranda, M. D.
[J]. APPLIED MATHEMATICS AND COMPUTATION, 2011, 217 (24) : 10328 - 10342
[34] A simple portmanteau test with data-driven truncation point
Baragona, Roberto
Battaglia, Francesco
Cucina, Domenico
[J]. COMPUTATIONAL STATISTICS, 2024, 39 (02) : 733 - 749
[35] DATA-DRIVEN VERSIONS OF PEARSONS CHISQUARE TEST FOR UNIFORMITY
BOGDAN, M
[J]. JOURNAL OF STATISTICAL COMPUTATION AND SIMULATION, 1995, 52 (03) : 217 - 237
[36] DATA-DRIVEN VERSION OF NEYMANS SMOOTH TEST OF FIT
LEDWINA, T
[J]. JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 1994, 89 (427) : 1000 - 1005
[37] Data-Driven Neuron Allocation for Scale Aggregation Networks
Li, Yi
Kuang, Zhanghui
Chen, Yimin
Zhang, Wayne
[J]. 2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 11518 - 11526
[38] A simple portmanteau test with data-driven truncation point
Roberto Baragona
Francesco Battaglia
Domenico Cucina
[J]. Computational Statistics, 2024, 39 : 733 - 749
[39] Data-Driven Force Control of an Automated Scratch Test
Diepers, Florian
Polke, Dominik
Ahle, Elmar
Soeffker, Dirk
[J]. 2022 10TH INTERNATIONAL CONFERENCE ON CONTROL, MECHATRONICS AND AUTOMATION (ICCMA 2022), 2022, : 94 - 99
[40] Data-driven approaches to phonological acquisition: An empirical test
Gillis, S
Durieux, G
[J]. PROCEEDINGS OF THE UBC INTERNATIONAL CONFERENCE ON PHONOLOGICAL ACQUISTION, 1996, : 277 - 292

← 1 2 3 4 5 →