Handling Language Variations in Open Source Bug Reporting Systems

被引:1
|
作者
Banerjee, Sean [1 ]
Musgrove, Jesse [1 ]
Cukic, Bojan [1 ]
机构
[1] W Virginia Univ, Lane Dept Comp Sci & Elect Engn, Morgantown, WV 26506 USA
关键词
Typographical Errors; Alternate Spellings; Duplicate Bug Reports; String Algorithms; Software Maintenance; Software Reliability;
D O I
10.1109/ISSREW.2012.85
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Natural language plays a critical role in the design, development and maintenance of software systems. For example, bug reporting systems allow users to submit reports describing observed anomalies in free form English. However, the free form aspect makes the detection of duplicate reports a challenge due to the breadth and diversity of language used by individual reporters. Tokenization, stemming and stop word removal are commonly used techniques to normalize and reduce the language space. However, the impact of typographical errors and alternate spellings has not been analyzed in the research literature. Our research indicates that handling language problems during automated bug triage analysis can lead to a boost in performance. We show that the language used in software problem reporting is too specialized to benefit from domain independent spell checkers or lexical databases. Therefore, we present a novel approach using word distance and neighbor word likelihood measures for detecting and resolving language-based issues in open-source software problem reporting. We evaluate our approach using the complete Firefox repository until March 2012. Our results indicate measurable improvements in duplicate detection results, while reducing the language space for most frequently used words by 30%. Moreover, our method is language-agnostic and does not require a pre-built dictionary, thus making it suitable for use in a variety of systems.
引用
收藏
页码:325 / 330
页数:6
相关论文
共 50 条
  • [1] On the Identification of Accessibility Bug Reports in Open Source Systems
    Aljedaani, Wajdi
    Mkaouer, Mohamed Wiem
    Ludi, Stephanie
    Ouni, Ali
    Jenhani, Ilyes
    19TH INTERNATIONAL WEB FOR ALL CONFERENCE, 2022,
  • [2] From android bug reports to android bug handling process: An empirical study of open-source development
    Yu L.
    Int. J. Open Source Softw. Processes, 4 (1-18): : 1 - 18
  • [3] Open Source Systems Bug Reports: Meta-Analysis
    Aljedaani, Wajdi
    Javed, Yasir
    Alenezi, Mamdouh
    2020 3RD INTERNATIONAL CONFERENCE ON BIG DATA AND EDUCATION (ICBDE 2020), 2020, : 43 - 49
  • [4] IRSTLM: an Open Source Toolkit for Handling Large Scale Language Models
    Federico, Marcello
    Bertoldi, Nicola
    Cettolo, Mauro
    INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5, 2008, : 1618 - 1621
  • [5] Bug characteristics in open source software
    Lin Tan
    Chen Liu
    Zhenmin Li
    Xuanhui Wang
    Yuanyuan Zhou
    Chengxiang Zhai
    Empirical Software Engineering, 2014, 19 : 1665 - 1705
  • [6] Bug characteristics in open source software
    Tan, Lin
    Liu, Chen
    Li, Zhenmin
    Wang, Xuanhui
    Zhou, Yuanyuan
    Zhai, Chengxiang
    EMPIRICAL SOFTWARE ENGINEERING, 2014, 19 (06) : 1665 - 1705
  • [7] S-DABT: Schedule and Dependency-aware Bug Triage in open-source bug tracking systems
    Jahanshahi, Hadi
    Cevik, Mucahit
    INFORMATION AND SOFTWARE TECHNOLOGY, 2022, 151
  • [8] Open-source software systems understanding bug prediction and software developer roles
    Lenin R.B.
    Ramaswamy S.
    Yu L.
    Govindan R.B.
    International Journal of Open Source Software and Processes, 2010, 2 (04) : 28 - 47
  • [9] Feature Ranking and Aggregation for Bug Triaging in Open-Source Issue Tracking Systems
    Goyal, Anjali
    Sardana, Neetu
    2021 11TH INTERNATIONAL CONFERENCE ON CLOUD COMPUTING, DATA SCIENCE & ENGINEERING (CONFLUENCE 2021), 2021, : 871 - 876
  • [10] How Power Users Help and Hinder Open Bug Reporting
    Ko, Andrew J.
    Chilana, Parmit K.
    CHI2010: PROCEEDINGS OF THE 28TH ANNUAL CHI CONFERENCE ON HUMAN FACTORS IN COMPUTING SYSTEMS, VOLS 1-4, 2010, : 1665 - 1674