Detecting duplicate bug reports with software engineering domain knowledge

被引:30
|
作者
Aggarwal, Karan [1 ]
Timbers, Finbarr [1 ]
Rutgers, Tanner [1 ]
Hindle, Abram [1 ]
Stroulia, Eleni [1 ]
Greiner, Russell [1 ]
机构
[1] Univ Alberta, Dept Comp Sci, Edmonton, AB, Canada
关键词
deduplication; documentation; duplicate bug reports; information retrieval; machine learning; software engineering textbooks; software literature;
D O I
10.1002/smr.1821
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Bug deduplication, ie, recognizing bug reports that refer to the same problem, is a challenging task in the software-engineering life cycle. Researchers have proposed several methods primarily relying on information-retrieval techniques. Our work motivated by the intuition that domain knowledge can provide the relevant context to enhance effectiveness, attempts to improve the use of information retrieval by augmenting with software-engineering knowledge. In our previous work, we proposed the software-literature-context method for using software-engineering literature as a source of contextual information to detect duplicates. If bug reports relate to similar subjects, they have a better chance of being duplicates. Our method, being largely automated, has apotential to substantially decrease the level of manual effort involved in conventional techniques with a minor trade-off in accuracy. In this study, we extend our work by demonstrating that domain-specific features can be applied across projects than project-specific features demonstrated previously while still maintaining performance. We also introduce a hierarchy-of-context to capture the software-engineering knowledge in the realms of contextual space to produce performance gains. We also highlight the importance of domain-specific contextual features through cross-domain contexts: adding context improved accuracy; Kappa scores improved by at least 3.8% to 10.8% per project.
引用
收藏
页数:15
相关论文
共 50 条
  • [41] Efficient feature extraction model for validation performance improvement of duplicate bug report detection in software bug triage systems
    Neysiani, Behzad Soleimani
    Babamir, Seyed Morteza
    Aritsugi, Masayoshi
    INFORMATION AND SOFTWARE TECHNOLOGY, 2020, 126
  • [42] An Empirical Study of the Effects of Expert Knowledge on Bug Reports
    Huo, Da
    Ding, Tao
    McMillan, Collin
    Gethers, Malcom
    2014 IEEE INTERNATIONAL CONFERENCE ON SOFTWARE MAINTENANCE AND EVOLUTION (ICSME), 2014, : 1 - 10
  • [43] Automotive software engineering - An emerging application domain for software engineering
    Salzmann, C
    Stauner, T
    LANGUAGES FOR SYSTEM SPECIFICATION: SELECTED CONTRIBUTIONS ON UML, SYSTEMC, SYSTEM VERILOG, MIXED-SIGNAL SYSTEMS, AND PROPERTY SPECIFICATION FROM FDL'03, 2004, : 333 - 347
  • [44] An Analysis of Software Bug Reports Using Machine Learning Techniques
    Tran H.M.
    Le S.T.
    Nguyen S.V.
    Ho P.T.
    SN Computer Science, 2020, 1 (1)
  • [45] Raising the Quality of Bug Reports by Predicting Software Defect Indicators
    Gromova, Anna
    Itkin, Iosif
    Pavlov, Sergey
    Korovayev, Alexander
    2019 COMPANION OF THE 19TH IEEE INTERNATIONAL CONFERENCE ON SOFTWARE QUALITY, RELIABILITY AND SECURITY (QRS-C 2019), 2019, : 198 - 204
  • [46] Identifying the domain of software engineering
    Kratchanov, KD
    Mehic, N
    International Conference on Computing, Communications and Control Technologies, Vol 1, Proceedings, 2004, : 148 - 155
  • [47] Fast Detection of Duplicate Bug Reports using LDA-based Topic Modeling and Classification
    Akilan, Thangarajah
    Shah, Dhruvit
    Patel, Nishi
    Mehta, Rinkal
    2020 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS (SMC), 2020, : 1622 - 1629
  • [48] The use of bibliometric and knowledge elicitation techniques to map a knowledge domain: Software Engineering in the 1990s
    Katherine W. McCain
    June M. Verner
    Gregory W. Hislop
    William Evanco
    Vera Cole
    Scientometrics, 2005, 65 : 131 - 144
  • [49] Problem On Software Engineering Learning: Domain Engineering
    Marcondes, Francisco S.
    Brumatto, Hamilton J.
    Sonoda, Eloiza H.
    Barboza, Luiz C.
    Zannuto, Jefferson
    PROCEEDINGS OF THE 2009 SIXTH INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY: NEW GENERATIONS, VOLS 1-3, 2009, : 1636 - 1636
  • [50] The use of bibliometric and Knowledge Elicitation techniques to map a knowledge domain: Software Engineering in the 1990s
    McCain, KW
    Verner, JM
    Hislop, GW
    Evanco, W
    Cole, V
    SCIENTOMETRICS, 2005, 65 (01) : 131 - 144