Data-Driven Test Selection at Scale

被引：1

作者：

Mehta, Sonu ^{[1
]}

Farmahinifarahani, Farima ^{[2
]}

Bhagwan, Ranjita ^{[1
]}

Guptha, Suraj ^{[3
]}

Jafari, Sina ^{[3
]}

Kumar, Rahul ^{[1
]}

Saini, Vaibhav ^{[3
]}

Santhiar, Anirudh ^{[3
]}

机构：

[1] Microsoft Res, Bangalore, Karnataka, India

[2] Univ Calif Irvine, Irvine, CA USA

[3] Microsoft Corp, Redmond, WA 98052 USA

来源：

PROCEEDINGS OF THE 29TH ACM JOINT MEETING ON EUROPEAN SOFTWARE ENGINEERING CONFERENCE AND SYMPOSIUM ON THE FOUNDATIONS OF SOFTWARE ENGINEERING (ESEC/FSE '21) | 2021年

关键词：

test selection; continuous integration; statistical models; REGRESSION TEST SELECTION;

D O I：

10.1145/3468264.3473916

中图分类号：

TP31 [计算机软件];

学科分类号：

081202 ; 0835 ;

摘要：

Large-scale services depend on Continuous Integration/Continuous Deployment (CI/CD) processes to maintain their agility and code-quality. Change-based testing plays an important role in finding bugs, but testing after every change is prohibitively expensive at a scale where thousands of changes are committed every hour. Test selection models deal with this issue by running a subset of tests for every change. In this paper, we present a generic, language-agnostic and light-weight statistical model for test selection. Unlike existing techniques, the proposed model does not require complex feature ex-traction techniques. Consequently, it scales to hundreds of repositories of varying characteristics while capturing more than 99% of buggy pull requests. Additionally, to better evaluate test selection models, we propose application-specific metrics that capture both a reduction in resource cost and a reduction in pull-request turn-around time. By evaluating our model on 22 large repositories at Microsoft, we find that we can save 15% - 30% of compute time while reporting back more than approximate to 99% of buggy pull requests.

引用

页码：1225 / 1235

页数：11

共 50 条

[41] Data-driven approaches to phonological acquisition: An empirical test
Gillis, S
Durieux, G
[J]. PROCEEDINGS OF THE UBC INTERNATIONAL CONFERENCE ON PHONOLOGICAL ACQUISTION, 1996, : 277 - 292
[42] Large Scale Data-Driven Evaluation in Computer Vision
Spampinato, Concetto
Boom, Bas
Huet, Benoit
[J]. COMPUTER VISION AND IMAGE UNDERSTANDING, 2015, 131 : III - IV
[43] Data-driven Authoring of Large-scale Ecosystems
Kapp, Konrad
Gain, James
Guerin, Eric
Galin, Eric
Peytavie, Adrien
[J]. ACM TRANSACTIONS ON GRAPHICS, 2020, 39 (06):
[44] Data-Driven Optimal Test Selection Design for Fault Detection and Isolation Based on CCVKL Method and PSO
Li, Yang
Chen, Hongtian
Lu, Ningyun
Jiang, Bin
Zio, Enrico
[J]. IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, 2022, 71
[45] Data-Driven Sensor Selection using Gumbel-max Sampling for Large-Scale IoT
Chen, Yuxuan
Chen, Yuan
Li, Guobing
[J]. 2023 IEEE 97TH VEHICULAR TECHNOLOGY CONFERENCE, VTC2023-SPRING, 2023,
[46] When Data-Driven Decision Making Becomes Data-Driven Test Taking: A Case Study of a Midwestern High School
Roegman, Rachel
Kenney, Rachael
Maeda, Yukiko
Johns, Gary
[J]. EDUCATIONAL POLICY, 2021, 35 (04) : 535 - 565
[47] DATA-DRIVEN
Lev-Ram, Michal
[J]. FORTUNE, 2016, 174 (05) : 76 - 81
[48] Selection Criteria for the Analysis of Data-Driven Clusters in Cerebral fMRI
Gomez-Laberge, Camille
Adler, Andy
Cameron, Ian
Nguyen, Thanh Binh
Hogan, Matthew J.
[J]. IEEE TRANSACTIONS ON BIOMEDICAL ENGINEERING, 2008, 55 (10) : 2372 - 2380
[49] Data-driven selection of the spline dimension in penalized spline regression
Kauermann, Goeran
Opsomer, Jean D.
[J]. BIOMETRIKA, 2011, 98 (01) : 225 - 230
[50] Data-Driven Selection of Tessellation Models Describing Polycrystalline Microstructures
Ondřej Šedivý
Daniel Westhoff
Jaromír Kopeček
Carl E. Krill III
Volker Schmidt
[J]. Journal of Statistical Physics, 2018, 172 : 1223 - 1246

← 1 2 3 4 5 →