Detecting duplicate bug reports with software engineering domain knowledge

被引:30
|
作者
Aggarwal, Karan [1 ]
Timbers, Finbarr [1 ]
Rutgers, Tanner [1 ]
Hindle, Abram [1 ]
Stroulia, Eleni [1 ]
Greiner, Russell [1 ]
机构
[1] Univ Alberta, Dept Comp Sci, Edmonton, AB, Canada
关键词
deduplication; documentation; duplicate bug reports; information retrieval; machine learning; software engineering textbooks; software literature;
D O I
10.1002/smr.1821
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Bug deduplication, ie, recognizing bug reports that refer to the same problem, is a challenging task in the software-engineering life cycle. Researchers have proposed several methods primarily relying on information-retrieval techniques. Our work motivated by the intuition that domain knowledge can provide the relevant context to enhance effectiveness, attempts to improve the use of information retrieval by augmenting with software-engineering knowledge. In our previous work, we proposed the software-literature-context method for using software-engineering literature as a source of contextual information to detect duplicates. If bug reports relate to similar subjects, they have a better chance of being duplicates. Our method, being largely automated, has apotential to substantially decrease the level of manual effort involved in conventional techniques with a minor trade-off in accuracy. In this study, we extend our work by demonstrating that domain-specific features can be applied across projects than project-specific features demonstrated previously while still maintaining performance. We also introduce a hierarchy-of-context to capture the software-engineering knowledge in the realms of contextual space to produce performance gains. We also highlight the importance of domain-specific contextual features through cross-domain contexts: adding context improved accuracy; Kappa scores improved by at least 3.8% to 10.8% per project.
引用
收藏
页数:15
相关论文
共 50 条
  • [31] Invalid bug reports complicate the software aging situation
    Xiaoxue Wu
    Wei Zheng
    Minchao Pu
    Jie Chen
    Dejun Mu
    Software Quality Journal, 2020, 28 : 195 - 220
  • [32] An Analysis of Software Bug Reports Using Random Forest
    Ha Manh Tran
    Sinh Van Nguyen
    Synh Viet Uyen Ha
    Thanh Quoc Le
    FUTURE DATA AND SECURITY ENGINEERING, FDSE 2018, 2018, 11251 : 273 - 285
  • [33] Automated labelling and severity prediction of software bug reports
    Otoom, Ahmed Fawzi
    Al-Shdaifat, Doaa
    Hammad, Maen
    Abdallah, Emad E.
    Aljammal, Ashraf
    INTERNATIONAL JOURNAL OF COMPUTATIONAL SCIENCE AND ENGINEERING, 2019, 19 (03) : 334 - 342
  • [34] Registered reports in software engineering
    Neil A. Ernst
    Maria Teresa Baldassarre
    Empirical Software Engineering, 2023, 28
  • [35] Registered Reports in Software Engineering
    Ernst, Neil A.
    Baldassarre, Maria Teresa
    arXiv, 2023,
  • [36] Registered reports in software engineering
    Ernst, Neil A.
    Baldassarre, Maria Teresa
    EMPIRICAL SOFTWARE ENGINEERING, 2023, 28 (02)
  • [37] KNOWLEDGE AND SOFTWARE ENGINEERING
    VANDEVELDE, W
    ENGINEERING INTELLIGENT SYSTEMS FOR ELECTRICAL ENGINEERING AND COMMUNICATIONS, 1995, 3 (01): : 3 - 8
  • [38] Detecting Duplicate Bug Report Using Character N-Gram-Based Features
    Sureka, Ashish
    Jalote, Pankaj
    17TH ASIA PACIFIC SOFTWARE ENGINEERING CONFERENCE (APSEC 2010), 2010, : 366 - 374
  • [39] A Software Engineering Ontology as Software Engineering Knowledge Representation
    Wongthongtham, P.
    Kasisopha, N.
    Chang, E.
    Dillon, T.
    THIRD 2008 INTERNATIONAL CONFERENCE ON CONVERGENCE AND HYBRID INFORMATION TECHNOLOGY, VOL 2, PROCEEDINGS, 2008, : 668 - 675
  • [40] THE SYNTHESIS OF KNOWLEDGE ENGINEERING AND SOFTWARE ENGINEERING
    SHAW, MLG
    GAINES, BR
    LECTURE NOTES IN COMPUTER SCIENCE, 1992, 593 : 208 - 220