Combining text mining and data mining for bug report classification

被引:99
|
作者
Zhou, Yu [1 ,2 ]
Tong, Yanxiang [1 ]
Gu, Ruihang [1 ]
Gall, Harald [3 ]
机构
[1] Nanjing Univ Aeronaut & Astronaut, Coll Comp Sci & Technol, Nanjing 210016, Peoples R China
[2] Nanjing Univ, State Key Lab Novel Software Technol, Nanjing 210023, Jiangsu, Peoples R China
[3] Univ Zurich, Dept Informat, CH-8050 Zurich, Switzerland
基金
中国国家自然科学基金;
关键词
software evolution; bug report classification; data mining; text mining; SOFTWARE; SEVERITY; FAULTS;
D O I
10.1002/smr.1770
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Bug reports represent an important information source for software construction. Misclassification of these reports inevitably introduces bias. Manual examinations can help reduce the noise, but bring a heavy burden for developers instead. In this paper, we propose a multi-stage approach by combining both text mining and data mining techniques to automate the prediction process. The first stage leverages text mining techniques to analyze the summary parts of bug reports and classifies them into three levels of probability. The extracted features and some other structured features of bug reports are then fed into the machine learner in the second stage. Data grafting techniques are employed to bridge the two stages. Comparative experiments with previous studies on the same datathree large-scale open-source projectsconsistently achieve a reasonable enhancement (from 77.4% to 81.7%, 76.1% to 81.6%, and 87.4% to 93.7%, respectively) over their best results in terms of overall performance. Additional comparative empirical experiments on other seven popular open-source systems confirm the findings. Moreover, based on the data obtained, we also empirically studied the impact relation between the underlying classifiers and various other properties of the combined model. A prototypical recommender system has been developed to demonstrate the applicability of our approach. Copyright (c) 2016 John Wiley & Sons, Ltd.
引用
收藏
页码:150 / 176
页数:27
相关论文
共 50 条
  • [21] Integration of text and data mining
    Drewes, B
    DATA MINING III, 2002, 6 : 289 - 298
  • [22] Text summarization in data mining
    Crangle, CE
    SOFT-WARE 2002: COMPUTING IN AN IMPERFECT WORLD, 2002, 2311 : 332 - 347
  • [23] Data mining combining data walkthrough
    Ohkura, M
    Shimizu, M
    Kakizawa, Y
    Nakayama, N
    VSMM 2000: 6TH INTERNATIONAL CONFERENCE ON VIRTUAL SYSTEMS AND MULTIMEDIA, 2000, : 483 - 489
  • [24] A SURVEY ON CLASSIFICATION TECHNIQUES FOR TEXT MINING
    Brindha, S.
    Sukumaran, S.
    Prabha, K.
    2016 3RD INTERNATIONAL CONFERENCE ON ADVANCED COMPUTING AND COMMUNICATION SYSTEMS (ICACCS), 2016,
  • [25] Cyberbullying Classification using Text Mining
    Noviantho
    Isa, Sani Muhamad
    Ashianti, Livia
    2017 1ST INTERNATIONAL CONFERENCE ON INFORMATICS AND COMPUTATIONAL SCIENCES (ICICOS), 2017, : 241 - 245
  • [26] Text mining in the classification of digital documents
    Contreras Barrera, Marcial
    BIBLIOS-REVISTA DE BIBLIOTECOLOGIA Y CIENCIAS DE LA INFORMACION, 2016, (64): : 33 - 43
  • [27] Text Mining for Type of Research Classification
    Lowe, David B.
    Dollinger, Ian
    Koster, Tristan
    Herbert, Bruce E.
    CATALOGING & CLASSIFICATION QUARTERLY, 2021, 59 (08) : 815 - 834
  • [28] Data mining algorithm for text data
    Chen, Yuquan
    Zhu, Xijun
    Lu, Ruzhan
    Shanghai Jiaotong Daxue Xuebao/Journal of Shanghai Jiaotong University, 2000, 34 (07): : 936 - 938
  • [29] The mining bug
    Evans-Pughe C.
    Engineering and Technology, 2010, 5 (15): : 54 - 57
  • [30] Data Mining Diagnostics and Bug MRIs for HW Bug Localization
    Farkash, Monica
    Hickerson, Bryan
    Samynathan, Balavinayagam
    2015 DESIGN, AUTOMATION & TEST IN EUROPE CONFERENCE & EXHIBITION (DATE), 2015, : 79 - 84