Combining text mining and data mining for bug report classification

被引:99
|
作者
Zhou, Yu [1 ,2 ]
Tong, Yanxiang [1 ]
Gu, Ruihang [1 ]
Gall, Harald [3 ]
机构
[1] Nanjing Univ Aeronaut & Astronaut, Coll Comp Sci & Technol, Nanjing 210016, Peoples R China
[2] Nanjing Univ, State Key Lab Novel Software Technol, Nanjing 210023, Jiangsu, Peoples R China
[3] Univ Zurich, Dept Informat, CH-8050 Zurich, Switzerland
基金
中国国家自然科学基金;
关键词
software evolution; bug report classification; data mining; text mining; SOFTWARE; SEVERITY; FAULTS;
D O I
10.1002/smr.1770
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Bug reports represent an important information source for software construction. Misclassification of these reports inevitably introduces bias. Manual examinations can help reduce the noise, but bring a heavy burden for developers instead. In this paper, we propose a multi-stage approach by combining both text mining and data mining techniques to automate the prediction process. The first stage leverages text mining techniques to analyze the summary parts of bug reports and classifies them into three levels of probability. The extracted features and some other structured features of bug reports are then fed into the machine learner in the second stage. Data grafting techniques are employed to bridge the two stages. Comparative experiments with previous studies on the same datathree large-scale open-source projectsconsistently achieve a reasonable enhancement (from 77.4% to 81.7%, 76.1% to 81.6%, and 87.4% to 93.7%, respectively) over their best results in terms of overall performance. Additional comparative empirical experiments on other seven popular open-source systems confirm the findings. Moreover, based on the data obtained, we also empirically studied the impact relation between the underlying classifiers and various other properties of the combined model. A prototypical recommender system has been developed to demonstrate the applicability of our approach. Copyright (c) 2016 John Wiley & Sons, Ltd.
引用
收藏
页码:150 / 176
页数:27
相关论文
共 50 条
  • [1] Combining Text Mining and Data Mining for Bug Report Classification
    Zhou, Yu
    Tong, Yanxiang
    Gu, Ruihang
    Gall, Harald
    2014 IEEE INTERNATIONAL CONFERENCE ON SOFTWARE MAINTENANCE AND EVOLUTION (ICSME), 2014, : 311 - 320
  • [2] Data Analysis Support by Combining Data Mining and Text Mining
    Matsumoto, Tomoya
    Sunayama, Wataru
    Hatanaka, Yuji
    Ogohara, Kazunori
    2017 6TH IIAI INTERNATIONAL CONGRESS ON ADVANCED APPLIED INFORMATICS (IIAI-AAI), 2017, : 313 - 318
  • [3] A technology of text classification of data mining
    Yang, Bin
    Meng, Zhi-qing
    Xiangtan Daxue Ziran Kexue Xuebao, 2001, 23 (04): : 34 - 37
  • [4] Vulnerability Identification and Classification Via Text Mining Bug Databases
    Wijayasekara, Dumidu
    Manic, Milos
    McQueen, Miles
    IECON 2014 - 40TH ANNUAL CONFERENCE OF THE IEEE INDUSTRIAL ELECTRONICS SOCIETY, 2014, : 3612 - 3618
  • [5] Automated Configuration Bug Report Prediction Using Text Mining
    Xie, Xin
    Lo, David
    Qiu, Weiwei
    Wang, Xingen
    Zhou, Bo
    2014 IEEE 38TH ANNUAL INTERNATIONAL COMPUTERS, SOFTWARE AND APPLICATIONS CONFERENCE (COMPSAC), 2014, : 107 - 116
  • [6] Cipher-Text Classification with Data Mining
    Khadivi, Pejman
    Momtazpour, Marjan
    2010 IEEE 4TH INTERNATIONAL SYMPOSIUM ON ADVANCED NETWORKS AND TELECOMMUNICATION SYSTEMS (ANTS), 2010, : 64 - 66
  • [7] Product bug report's data mining model
    Chang, Chun Chia
    Yin, Yan Cheng
    Hwang, Chein-Shung
    WMSCI 2006: 10TH WORLD MULTI-CONFERENCE ON SYSTEMICS, CYBERNETICS AND INFORMATICS, VOL IV, PROCEEDINGS, 2006, : 118 - +
  • [8] Text Mining Approaches for Dependent Bug Report Assembly and Severity Prediction
    Luaphol, Bancha
    Polpinij, Jantima
    Kaenampornpan, Manasawee
    INTERNATIONAL ARAB JOURNAL OF INFORMATION TECHNOLOGY, 2022, 19 (06) : 915 - 924
  • [9] Data Mining and Text Mining - A Survey
    Suresh, R.
    Harshni, S. R.
    2017 INTERNATIONAL CONFERENCE ON COMPUTATION OF POWER, ENERGY INFORMATION AND COMMUNICATION (ICCPEIC), 2017, : 412 - 419
  • [10] Text Data Mining Algorithm Combining CNN and DBM Models
    Dai, Rong
    MOBILE INFORMATION SYSTEMS, 2021, 2021