RTPTorrent: An Open-source Dataset for Evaluating Regression Test Prioritization

被引:11
|
作者
Mattis, Toni [1 ]
Rein, Patrick [1 ]
Duersch, Falco [1 ]
Hirschfeld, Robert [1 ]
机构
[1] Univ Potsdam, Hasso Plattner Inst, Potsdam, Germany
关键词
Regression Test Prioritization; TravisCI; GitHub; !text type='Java']Java[!/text; Dataset; SEQUENCE APPROACH; CONSTRAINTS; SOFTWARE;
D O I
10.1145/3379597.3387458
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
The software engineering practice of automated testing helps programmers find defects earlier during development. With growing software projects and longer-running test suites, frequency and immediacy of feedback decline, thereby making defects harder to repair. Regression test prioritization (RTP) is concerned with running relevant tests earlier to lower the costs of defect localization and to improve feedback. Finding representative data to evaluate RTP techniques is non-trivial, as most software is published without failing tests. In this work, we systematically survey a wide range of RTP literature regarding whether their dataset uses real or synthetic defects or tests, whether they are publicly available, and whether datasets are reused. We observed that some datasets are reused, however, many projects study only few projects and these rarely resemble real-world development activity. In light of these threats to ecological validity, we describe the construction and characteristics of a new dataset, named RTPTorrent, based on 20 open-source Java programs. Our dataset allows researchers to evaluate prioritization heuristics based on version control meta-data, source code, and test results from fine-grained, automated builds over 9 years of development history. We provide reproducible baselines for initial comparisons and make all data publicly available. We see this as a step towards better reproducibility, ecological validity, and long-term availability of studied software in the field of test prioritization.
引用
收藏
页码:385 / 396
页数:12
相关论文
共 50 条
  • [1] Evaluating Regression Test Selection Opportunities in a Very Large Open-Source Ecosystem
    Gyori, Alex
    Legunsen, Owolabi
    Hariri, Farah
    Marinov, Darko
    [J]. 2018 29TH IEEE INTERNATIONAL SYMPOSIUM ON SOFTWARE RELIABILITY ENGINEERING (ISSRE), 2018, : 112 - 122
  • [2] A Dataset of Open-Source Android Applications
    Krutz, Daniel E.
    Mirakhorli, Mehdi
    Malachowsky, Samuel A.
    Ruiz, Andres
    Peterson, Jacob
    Filipski, Andrew
    Smith, Jared
    [J]. 12TH WORKING CONFERENCE ON MINING SOFTWARE REPOSITORIES (MSR 2015), 2015, : 522 - 525
  • [3] AID: OPEN-SOURCE ANECHOIC INTERFERER DATASET
    Goetz, Philipp
    Tuna, Cagdas
    Walther, Andreas
    Habets, Emanuel A. P.
    [J]. 2022 INTERNATIONAL WORKSHOP ON ACOUSTIC SIGNAL ENHANCEMENT (IWAENC 2022), 2022,
  • [4] PYXIS: AN OPEN-SOURCE PERFORMANCE DATASET OF SPARSE ACCELERATORS
    Song, Linghao
    Chi, Yuze
    Cong, Jason
    [J]. 2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 76 - 80
  • [5] VulinOSS: A Dataset of Security Vulnerabilities in Open-source Systems
    Gkortzis, Antonios
    Mitropoulos, Dimitris
    Spinellis, Diomidis
    [J]. 2018 IEEE/ACM 15TH INTERNATIONAL CONFERENCE ON MINING SOFTWARE REPOSITORIES (MSR), 2018, : 18 - 21
  • [6] A Dataset of Microservices-based Open-Source Projects
    d'Aragona, Dario Amoroso
    Bakhtin, Alexander
    Li, Xiaozhou
    Su, Ruoyu
    Adams, Lauren
    Aponte, Ernesto
    Boyle, Francis
    Boyle, Patrick
    Koerner, Rachel
    Lee, Joseph
    Tian, Fangchao
    Wang, Yuqing
    Nyyssola, Jesse
    Quevedo, Ernesto
    Rahaman, Shahidur Md
    Abdelfattah, Amr S.
    Mantyla, Mika
    Cerny, Tomas
    Taibi, Davide
    [J]. 2024 IEEE/ACM 21ST INTERNATIONAL CONFERENCE ON MINING SOFTWARE REPOSITORIES, MSR, 2024, : 504 - 509
  • [7] The NWRD Dataset: An Open-Source Annotated Segmentation Dataset of Diseased Wheat Crop
    Anwar, Hirra
    Khan, Saad Ullah
    Ghaffar, Muhammad Mohsin
    Fayyaz, Muhammad
    Khan, Muhammad Jawad
    Weis, Christian
    Wehn, Norbert
    Shafait, Faisal
    [J]. SENSORS, 2023, 23 (15)
  • [8] GE526: A Dataset of Open-Source Game Engines
    Vagavolu, Dheeraj
    Agrahari, Vartika
    Chimalakonda, Sridhar
    Venigalla, Akhila Sri Manasa
    [J]. 2021 IEEE/ACM 18TH INTERNATIONAL CONFERENCE ON MINING SOFTWARE REPOSITORIES (MSR 2021), 2021, : 605 - 609
  • [9] Collecting Vulnerable Source Code from Open-Source Repositories for Dataset Generation
    Raducu, Razvan
    Esteban, Gonzalo
    Rodriguez Lera, Francisco J.
    Fernandez, Camino
    [J]. APPLIED SCIENCES-BASEL, 2020, 10 (04):
  • [10] Evaluating the Data Inconsistency of Open-Source Vulnerability Repositories
    Jiang, Yuning
    Jeusfeld, Manfred
    Ding, Jianguo
    [J]. ARES 2021: 16TH INTERNATIONAL CONFERENCE ON AVAILABILITY, RELIABILITY AND SECURITY, 2021,