Automated Recovery of Issue-Commit Links Leveraging Both Textual and Non-textual Data

被引:6
|
作者
Mazrae, Pooya Rostami [1 ]
Izadi, Maliheh [1 ]
Heydarnoori, Abbas [1 ]
机构
[1] Sharif Univ Technol, Comp Engn Dept, Tehran, Iran
关键词
Link Recovery; Issue Report; Commit; Software Maintenance; Machine Learning; Ensemble Methods;
D O I
10.1109/ICSME52107.2021.00030
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
An issue report documents the discussions around required changes in issue-tracking systems, while a commit contains the change itself in the version control systems. Recovering links between issues and commits can facilitate many software evolution tasks such as bug localization, defect prediction, software quality measurement, and software documentation. A previous study on over half a million issues from GitHub reports only about 42.2% of issues are manually linked by developers to their pertinent commits. Automating the linking of commit-issue pairs can contribute to the improvement of the said tasks. By far, current state-of-the-art approaches for automated commit-issue linking suffer from low precision, leading to unreliable results, sometimes to the point that imposes human supervision on the predicted links. The low performance gets even more severe when there is a lack of textual information in either commits or issues. Current approaches are also proven computationally expensive. We propose Hybrid-Linker, an enhanced approach that overcomes such limitations by exploiting two information channels; (1) a non-textual-based component that operates on non-textual, automatically recorded information of the commit-issue pairs to predict a link, and (2) a textual-based one which does the same using textual information of the commit-issue pairs. Then, combining the results from the two classifiers, Hybrid-Linker makes the final prediction. Thus, every time one component falls short in predicting a link, the other component fills the gap and improves the results. We evaluate Hybrid-Linker against competing approaches, namely FRLink and DeepLink on a dataset of 12 projects. Hybrid-Linker achieves 90.1%, 87.8%, and 88.9% based on recall, precision, and F-measure, respectively. It also outperforms FRLink and DeepLink by 31.3%, and 41.3%, regarding the F-measure. Moreover, the proposed approach exhibits extensive improvements in terms of performance as well. Finally, our source code and data are publicly available.
引用
收藏
页码:263 / 273
页数:11
相关论文
共 5 条
  • [1] Leveraging Textual and Non-Textual Features for Documentation Decluttering
    Colavito, Giuseppe
    Basile, Pierpaolo
    Novielli, Nicole
    [J]. 2020 IEEE INTERNATIONAL CONFERENCE ON SOFTWARE MAINTENANCE AND EVOLUTION (ICSME 2020), 2020, : 862 - 863
  • [2] FRLink: Improving the recovery of missing issue-commit links by revisiting file relevance
    Sun, Yan
    Wang, Qing
    Yang, Ye
    [J]. INFORMATION AND SOFTWARE TECHNOLOGY, 2017, 84 : 33 - 47
  • [3] Improving Missing Issue-Commit Link Recovery using Positive and Unlabeled Data
    Sun, Yan
    Chen, Celia
    Wang, Qing
    Boehm, Barry
    [J]. PROCEEDINGS OF THE 2017 32ND IEEE/ACM INTERNATIONAL CONFERENCE ON AUTOMATED SOFTWARE ENGINEERING (ASE'17), 2017, : 147 - 152
  • [4] Automated data function extraction from textual requirements by leveraging semi-supervised CRF and language model
    Li, Mingyang
    Shi, Lin
    Wang, Yawen
    Wang, Junjie
    Wang, Qing
    Hu, Jun
    Peng, Xinhua
    Liao, Weimin
    Pi, Guizhen
    [J]. INFORMATION AND SOFTWARE TECHNOLOGY, 2022, 143
  • [5] Empirical evaluation of an automated intraday stock recommendation system incorporating both market data and textual news
    Geva, Tomer
    Zahavi, Jacob
    [J]. DECISION SUPPORT SYSTEMS, 2014, 57 : 212 - 223