Data-Driven Test Selection at Scale

被引:1
|
作者
Mehta, Sonu [1 ]
Farmahinifarahani, Farima [2 ]
Bhagwan, Ranjita [1 ]
Guptha, Suraj [3 ]
Jafari, Sina [3 ]
Kumar, Rahul [1 ]
Saini, Vaibhav [3 ]
Santhiar, Anirudh [3 ]
机构
[1] Microsoft Res, Bangalore, Karnataka, India
[2] Univ Calif Irvine, Irvine, CA USA
[3] Microsoft Corp, Redmond, WA 98052 USA
关键词
test selection; continuous integration; statistical models; REGRESSION TEST SELECTION;
D O I
10.1145/3468264.3473916
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Large-scale services depend on Continuous Integration/Continuous Deployment (CI/CD) processes to maintain their agility and code-quality. Change-based testing plays an important role in finding bugs, but testing after every change is prohibitively expensive at a scale where thousands of changes are committed every hour. Test selection models deal with this issue by running a subset of tests for every change. In this paper, we present a generic, language-agnostic and light-weight statistical model for test selection. Unlike existing techniques, the proposed model does not require complex feature ex-traction techniques. Consequently, it scales to hundreds of repositories of varying characteristics while capturing more than 99% of buggy pull requests. Additionally, to better evaluate test selection models, we propose application-specific metrics that capture both a reduction in resource cost and a reduction in pull-request turn-around time. By evaluating our model on 22 large repositories at Microsoft, we find that we can save 15% - 30% of compute time while reporting back more than approximate to 99% of buggy pull requests.
引用
收藏
页码:1225 / 1235
页数:11
相关论文
共 50 条
  • [21] Data-Driven Online Model Selection With Regret Guarantees
    Pacchiano, Aldo
    Dann, Christoph
    Gentile, Claudio
    [J]. INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 238, 2024, 238
  • [22] A Data-driven project categorization process for portfolio selection
    El Bok, Ghizlane
    Berrado, Abdelaziz
    [J]. JOURNAL OF MODELLING IN MANAGEMENT, 2022, 17 (02) : 764 - 787
  • [23] Data-Driven Modeling of the Airport Configuration Selection Process
    Ramanujam, Varun
    Balakrishnan, Hamsa
    [J]. IEEE TRANSACTIONS ON HUMAN-MACHINE SYSTEMS, 2015, 45 (04) : 490 - 499
  • [24] Data-Driven Parameter Selection and Modeling for Concrete Carbonation
    Duan, Kangkang
    Cao, Shuangyin
    [J]. MATERIALS, 2022, 15 (09)
  • [25] Data-Driven Approach for Imperfect Maintenance Model Selection
    Liu, Yu
    Huang, Hong-Zhong
    Zhang, Xiaoling
    [J]. ANNUAL RELIABILITY AND MAINTAINABILITY SYMPOSIUM (RAMS), 2011 PROCEEDINGS, 2011,
  • [26] Data-Driven Answer Selection in Community QA Systems
    Nie, Liqiang
    Wei, Xiaochi
    Zhang, Dongxiang
    Wang, Xiang
    Gao, Zhipeng
    Yang, Yi
    [J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2017, 29 (06) : 1186 - 1198
  • [27] A DATA-DRIVEN MADM MODEL FOR PERSONNEL SELECTION AND IMPROVEMENT
    Chuang, Yen-Ching
    Hu, Shu-Kung
    Liou, James J. H.
    Tzeng, Gwo-Hshiung
    [J]. TECHNOLOGICAL AND ECONOMIC DEVELOPMENT OF ECONOMY, 2020, 26 (04) : 751 - 784
  • [28] Data-Driven Regularization Parameter Selection in Dynamic MRI
    Hanhela, Matti
    Grohn, Olli
    Kettunen, Mikko
    Niinimaki, Kati
    Vauhkonen, Marko
    Kolehmainen, Ville
    [J]. JOURNAL OF IMAGING, 2021, 7 (02)
  • [29] Measurement Selection for Data-Driven Monitoring of Distribution Systems
    Ferdowsi, Mohsen
    Benigni, Andrea
    Monti, Antonello
    Ponci, Ferdinanda
    [J]. IEEE SYSTEMS JOURNAL, 2019, 13 (04): : 4260 - 4268
  • [30] Data-Driven Ranking and Selection Under Input Uncertainty
    Wu, Di
    Wang, Yuhao
    Zhou, Enlu
    [J]. OPERATIONS RESEARCH, 2024, 72 (02) : 781 - 795