An HMM-based approach for automatic detection and classification of duplicate bug reports

被引:30
|
作者
Ebrahimi, Neda [1 ]
Trabelsi, Abdelaziz [1 ]
Islam, Md Shariful [1 ]
Hamou-Lhadj, Abdelwahab [1 ]
Khanmohammadi, Kobra [1 ]
机构
[1] Concordia Univ, Dept Elect & Comp Engn, Montreal, PQ, Canada
基金
加拿大自然科学与工程研究理事会;
关键词
Duplicate bug reports; Stack traces; Hidden Markov models; Machine learning; Mining software repositories; INFORMATION-RETRIEVAL; LOCALIZATION; MODELS;
D O I
10.1016/j.infsof.2019.05.007
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Context: Software projects rely on their issue tracking systems to guide maintenance activities of software developers. Bug reports submitted to the issue tracking systems carry crucial information about the nature of the crash (such as texts from users or developers and execution information about the running functions before the occurrence of a crash). Typically, big software projects receive thousands of reports every day. Objective: The aim is to reduce the time and effort required to fix bugs while improving software quality overall. Previous studies have shown that a large amount of bug reports are duplicates of previously reported ones. For example, as many as 30% of all reports in for Firefox are duplicates. Method: While there exist a wide variety of approaches to automatically detect duplicate bug reports by natural language processing, only a few approaches have considered execution information (the so-called stack traces) inside bug reports. In this paper, we propose a novel approach that automatically detects duplicate bug reports using stack traces and Hidden Markov Models. Results: When applying our approach to Firefox and GNOME datasets, we show that, for Firefox, the average recall for Rank k = 1 is 59%, for Rank k = 2 is 75.55%. We start reaching the 90% recall from k = 10. The Mean Average Precision (MAP) value is up to 76.5%. For GNOME, The recall at k = 1 is around 63%, while this value increases by about 10% for k = 2. The recall increases to 97% for k = 11. A MAP value of up to 73% is achieved. Conclusion: We show that HMM and stack traces are a powerful combination for detecting and classifying duplicate bug reports in large bug repositories.
引用
收藏
页码:98 / 109
页数:12
相关论文
共 50 条
  • [1] A Novel Technique for Duplicate Detection and Classification of Bug Reports
    Zhang, Tao
    Lee, Byungjeong
    [J]. IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2014, E97D (07): : 1756 - 1768
  • [2] HMM-BASED APPROACH FOR AUTOMATIC CHORD DETECTION USING REFINED ACOUSTIC FEATURES
    Ueda, Yushi
    Uchiyama, Yuki
    Nishimoto, Takuya
    Ono, Nobutaka
    Sagayama, Shigeki
    [J]. 2010 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2010, : 5518 - 5521
  • [3] A universal HMM-based approach to image sequence classification
    Morguet, P
    Lang, M
    [J]. INTERNATIONAL CONFERENCE ON IMAGE PROCESSING - PROCEEDINGS, VOL III, 1997, : 146 - 149
  • [4] Automatic Information Extraction from the Web: An HMM-Based Approach
    Tran-Le, M. S.
    Vo-Dang, T. T.
    Ho-Van, Quan
    Dang, T. K.
    [J]. MODELING, SIMULATION AND OPTIMIZATION OF COMPLEX PROCESSES, 2008, : 575 - 585
  • [5] Fast Detection of Duplicate Bug Reports using LDA-based Topic Modeling and Classification
    Akilan, Thangarajah
    Shah, Dhruvit
    Patel, Nishi
    Mehta, Rinkal
    [J]. 2020 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS (SMC), 2020, : 1622 - 1629
  • [6] An embedded HMM-based approach for face detection and recognition
    Nefian, AV
    Hayes, MH
    [J]. ICASSP '99: 1999 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, PROCEEDINGS VOLS I-VI, 1999, : 3553 - 3556
  • [7] An HMM-Based Anomaly Detection Approach for SCADA Systems
    Stefanidis, Kyriakos
    Voyiatzis, Artemios G.
    [J]. INFORMATION SECURITY THEORY AND PRACTICE, WISTP 2016, 2016, 9895 : 85 - 99
  • [8] AVS - An approach to identifying and mitigating duplicate bug reports
    Santos, Ivan
    Araujo, Joelson
    Lima, Cloves
    Prudencio, Ricardo B. C.
    Barros, Flavia
    [J]. PROCEEDINGS OF THE 14TH BRAZILIAN SYMPOSIUM ON INFORMATION SYSTEMS (SBSI2018), 2018, : 168 - 174
  • [9] A HMM-BASED METHOD FOR ANOMALY DETECTION
    Wang, Fei
    Zhu, Hongliang
    Tian, Bin
    Xin, Yang
    Niu, Xinxin
    Yang, Yu
    [J]. 2011 4TH IEEE INTERNATIONAL CONFERENCE ON BROADBAND NETWORK AND MULTIMEDIA TECHNOLOGY (4TH IEEE IC-BNMT2011), 2011, : 276 - 280
  • [10] Detection of Duplicate Bug Reports in Jira and Bugzilla Tools
    Aldan, Cigdem
    Demir, Engin
    [J]. 2020 TURKISH NATIONAL SOFTWARE ENGINEERING SYMPOSIUM (UYMS), 2020, : 126 - 129