Estimating the number of remaining links in traceability recovery

被引:0
|
作者
Davide Falessi
Massimiliano Di Penta
Gerardo Canfora
Giovanni Cantone
机构
[1] California Polytechnic State University,Department of Computer Science
[2] University of Sannio,Department of Engineering
[3] University of Rome Tor Vergata,Department of Civil Engineering and Computer Science
[4] DICII,undefined
来源
关键词
Information retrieval; Traceability link recovery; Metrics and measurement;
D O I
暂无
中图分类号
学科分类号
摘要
Although very important in software engineering, establishing traceability links between software artifacts is extremely tedious, error-prone, and it requires significant effort. Even when approaches for automated traceability recovery exist, these provide the requirements analyst with a, usually very long, ranked list of candidate links that needs to be manually inspected. In this paper we introduce an approach called Estimation of the Number of Remaining Links (ENRL) which aims at estimating, via Machine Learning (ML) classifiers, the number of remaining positive links in a ranked list of candidate traceability links produced by a Natural Language Processing techniques-based recovery approach. We have evaluated the accuracy of the ENRL approach by considering several ML classifiers and NLP techniques on three datasets from industry and academia, and concerning traceability links among different kinds of software artifacts including requirements, use cases, design documents, source code, and test cases. Results from our study indicate that: (i) specific estimation models are able to provide accurate estimates of the number of remaining positive links; (ii) the estimation accuracy depends on the choice of the NLP technique, and (iii) univariate estimation models outperform multivariate ones.
引用
收藏
页码:996 / 1027
页数:31
相关论文
共 50 条
  • [41] Query-driven soft traceability links for models
    Hegedus, Abel
    Horvath, Akos
    Rath, Istvan
    Starr, Rodrigo Rizzi
    Varro, Daniel
    SOFTWARE AND SYSTEMS MODELING, 2016, 15 (03): : 733 - 756
  • [42] Traceability in the Wild: Automatically Augmenting Incomplete Trace Links
    Rath, Michael
    Rendall, Jacob
    Guo, Jin L. C.
    Cleland-Huang, Jane
    Mader, Patrick
    PROCEEDINGS 2018 IEEE/ACM 40TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING (ICSE), 2018, : 834 - 845
  • [43] Recovering Traceability Links Between Code and Documentation: A Retrospective
    Antoniol, Giulio
    Canfora, Gerardo
    Casazza, Gerardo
    De Lucia, Andrea
    Merlo, Ettore
    IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 2025, 51 (03) : 825 - 832
  • [44] Adapting Word Embeddings to Traceability Recovery
    Tian, Qingsong
    Cao, Qinghua
    Sun, Qing
    PROCEEDINGS OF 2018 INTERNATIONAL CONFERENCE ON INFORMATION SYSTEMS AND COMPUTER AIDED EDUCATION (ICISCAE 2018), 2018, : 255 - 261
  • [45] Traceability recovery in RAD software systems
    Di Penta, M
    Gradara, S
    Antoniol, G
    10TH INTERNATIONAL WORKSHOP ON PROGRAM COMPREHENSION, PROCEEDINGS, 2002, : 207 - 216
  • [46] Traceability recovery by modeling programmer behavior
    Antoniol, G
    Casazza, G
    Cimitile, A
    SEVENTH WORKING CONFERENCE ON REVERSE ENGINEERING - PROCEEDINGS, 2000, : 240 - 247
  • [47] A Machine Learning Approach for Determining the Validity of Traceability Links
    Mills, Chris
    Haiduc, Sonia
    PROCEEDINGS OF THE 2017 IEEE/ACM 39TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING COMPANION (ICSE-C 2017), 2017, : 121 - 123
  • [48] Visualizing Traceability Links between Source Code and Documentation
    Chen, Xiaofan
    Hosking, John
    Grundy, John
    2012 IEEE SYMPOSIUM ON VISUAL LANGUAGES AND HUMAN-CENTRIC COMPUTING (VL/HCC), 2012, : 119 - 126
  • [49] Supporting evolutionary development by feature models and traceability links
    Riebisch, M
    11TH IEEE INTERNATIONAL CONFERENCE AND WORKSHOP ON THE ENGINEERING OF COMPUTER-BASED SYSTEMS, PROCEEDINGS, 2004, : 370 - 377
  • [50] Indecomposability and the number of links
    徐运阁
    张英伯
    Science China Mathematics, 2001, (12) : 1515 - 1522