Combining text mining and data mining for bug report classification

被引:99
|
作者
Zhou, Yu [1 ,2 ]
Tong, Yanxiang [1 ]
Gu, Ruihang [1 ]
Gall, Harald [3 ]
机构
[1] Nanjing Univ Aeronaut & Astronaut, Coll Comp Sci & Technol, Nanjing 210016, Peoples R China
[2] Nanjing Univ, State Key Lab Novel Software Technol, Nanjing 210023, Jiangsu, Peoples R China
[3] Univ Zurich, Dept Informat, CH-8050 Zurich, Switzerland
基金
中国国家自然科学基金;
关键词
software evolution; bug report classification; data mining; text mining; SOFTWARE; SEVERITY; FAULTS;
D O I
10.1002/smr.1770
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Bug reports represent an important information source for software construction. Misclassification of these reports inevitably introduces bias. Manual examinations can help reduce the noise, but bring a heavy burden for developers instead. In this paper, we propose a multi-stage approach by combining both text mining and data mining techniques to automate the prediction process. The first stage leverages text mining techniques to analyze the summary parts of bug reports and classifies them into three levels of probability. The extracted features and some other structured features of bug reports are then fed into the machine learner in the second stage. Data grafting techniques are employed to bridge the two stages. Comparative experiments with previous studies on the same datathree large-scale open-source projectsconsistently achieve a reasonable enhancement (from 77.4% to 81.7%, 76.1% to 81.6%, and 87.4% to 93.7%, respectively) over their best results in terms of overall performance. Additional comparative empirical experiments on other seven popular open-source systems confirm the findings. Moreover, based on the data obtained, we also empirically studied the impact relation between the underlying classifiers and various other properties of the combined model. A prototypical recommender system has been developed to demonstrate the applicability of our approach. Copyright (c) 2016 John Wiley & Sons, Ltd.
引用
收藏
页码:150 / 176
页数:27
相关论文
共 50 条
  • [41] A TEXT MINING TECHNIQUE FOR MANUFACTURING SUPPLIER CLASSIFICATION
    Yazdizadeh, Peyman
    Ameri, Farhad
    INTERNATIONAL DESIGN ENGINEERING TECHNICAL CONFERENCES AND COMPUTERS AND INFORMATION IN ENGINEERING CONFERENCE, 2015, VOL 1B, 2016,
  • [42] Malware Detection by Text and Data Mining
    Sundarkumar, G. Ganesh
    Ravi, Vadlamani
    2013 IEEE INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND COMPUTING RESEARCH (ICCIC), 2013, : 566 - 571
  • [43] Pattern and Cluster Mining on Text Data
    Agnihotri, Deepak
    Verma, Kesari
    Tripathi, Priyanka
    2014 FOURTH INTERNATIONAL CONFERENCE ON COMMUNICATION SYSTEMS AND NETWORK TECHNOLOGIES (CSNT), 2014, : 428 - 432
  • [44] Text Mining in Big Data Analytics
    Cogburn, Derrick L.
    Hine, Michael J.
    Peladeau, Normand
    Yoon, Victoria Y.
    PROCEEDINGS OF THE 51ST ANNUAL HAWAII INTERNATIONAL CONFERENCE ON SYSTEM SCIENCES (HICSS), 2018, : 584 - 586
  • [45] Text and data mining: Together at last!
    Trippe, AJ
    ABSTRACTS OF PAPERS OF THE AMERICAN CHEMICAL SOCIETY, 2005, 230 : U1006 - U1006
  • [46] The research of classification technologies based on text mining
    Liu, LZ
    Li, PZ
    Chen, JJ
    ISTM/2005: 6th International Symposium on Test and Measurement, Vols 1-9, Conference Proceedings, 2005, : 8517 - 8520
  • [47] Text Mining: Sentiment Analysis on news classification
    Gomes, Helder
    Neto, Miguel de Castro
    Henriques, Roberto
    PROCEEDINGS OF THE 2013 8TH IBERIAN CONFERENCE ON INFORMATION SYSTEMS AND TECHNOLOGIES (CISTI 2013), 2013,
  • [48] Research article classification with text mining method
    Gurbuz, Tugba
    Uluyol, Celebi
    CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2023, 35 (01):
  • [49] Classification of software patches: a text mining approach
    Raja, Uzma
    Tretter, Marietta J.
    JOURNAL OF SOFTWARE MAINTENANCE AND EVOLUTION-RESEARCH AND PRACTICE, 2011, 23 (02): : 69 - 87
  • [50] Hierarchical Classification in Text Mining for Sentiment Analysis
    Li, Jinyan
    Fong, Simon
    Zhuang, Yan
    Khoury, Richard
    2014 INTERNATIONAL CONFERENCE ON SOFT COMPUTING & MACHINE INTELLIGENCE ISCMI 2014, 2014, : 46 - 51