Mining software architecture knowledge: Classifying stack overflow posts using machine learning

被引:3
|
作者
Ali, Mubashir [1 ]
Mushtaq, Husnain [2 ]
Rasheed, Muhammad B. [3 ,4 ]
Baqir, Anees [5 ]
Alquthami, Thamer [6 ]
机构
[1] Univ Bergamo, DIGIP, Bergamo, Italy
[2] Univ Lahore, Dept Comp Sci, Gujrat, Pakistan
[3] Univ Alcala, Dept Comp Engn, Madrid 28801, Spain
[4] Univ Lahore, Dept Elect & Elect Syst, Lahore, Pakistan
[5] Ca Foscari Univ Venice, Dept Environm Sci Informat & Stat, Venice, Italy
[6] King Abdulaziz Univ, Dept Elect Engn & Comp Engn, Jeddah, Saudi Arabia
来源
关键词
architectural knowledge management; stack overflow; crowd‐ sourced communities; text mining; classification;
D O I
10.1002/cpe.6277
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Software Architectural Process (SAP) is a core and excessively knowledge intensive phase of software development life cycle, as it consumes and produces knowledge artifacts, simultaneously. SAP is about making design decisions, and the changes in these verdicts may pose adverse effects on software projects. The performance and properties of software components are fundamentally influenced by the design decisions. The implementation of immature and abrupt design decisions seriously threatens the development process of SAP. Moreover, software architectural knowledge management (AKM) approaches offer systematic ways to support SAP through versatile architectural solutions and design decisions. However, the majority of software organizations have limited access to data and still depend upon manually created and maintained AKM process. In this paper, we have utilized the one of the most prominent online community for software development (i.e., Stack Overflow) as a source of SAP knowledge to support AKM. In order to support AKM, we have proposed a supervised machine learning-based approach to classify the architectural knowledge into predefined categories, that is, analysis, synthesis, evaluation, and implementation. We have employed different combinations of feature selection technique to achieve the optimal classification results of the used classifiers (Support Vector Machine [SVM], K-Nearest Neighbor, Random Forest, and Naive Bayes [NB]). Among these classifiers, SVM with Uni-gram feature set provides best classification results and attains 85.80% accuracy. For evaluating the proposed approach's effectiveness, we have also computed the suitability of the classifiers, that is, the cost of computation along with its accuracy, and NB with Uni-gram feature set proved to be the most suitable.
引用
收藏
页数:17
相关论文
共 50 条
  • [1] PostFinder: Mining Stack Overflow posts to support software developers
    Rubei, Riccardo
    Di Sipio, Claudio
    Nguyen, Phuong T.
    Di Rocco, Juri
    Di Ruscio, Davide
    INFORMATION AND SOFTWARE TECHNOLOGY, 2020, 127
  • [2] Classifying Stack Overflow Posts on API Issues
    Ahasanuzzaman, Md
    Asaduzzaman, Muhammad
    Roy, Chanchal K.
    Schneider, Kevin A.
    2018 25TH IEEE INTERNATIONAL CONFERENCE ON SOFTWARE ANALYSIS, EVOLUTION AND REENGINEERING (SANER 2018), 2018, : 244 - 254
  • [3] Automatically Classifying Posts into Question Categories on Stack Overflow
    Beyer, Stefanie
    Macho, Christian
    Pinzger, Martin
    Di Penta, Massimiliano
    2018 IEEE/ACM 26TH INTERNATIONAL CONFERENCE ON PROGRAM COMPREHENSION (ICPC 2018), 2018, : 211 - 221
  • [4] Sentiment overflow in the testing stack: Analyzing software testing posts on Stack Overflow
    Swillus, Mark
    Zaidman, Andy
    JOURNAL OF SYSTEMS AND SOFTWARE, 2023, 205
  • [5] Mining Architecture Tactics and Quality Attributes knowledge in Stack Overflow
    Bi, Tingting
    Liang, Peng
    Tang, Antony
    Xia, Xin
    JOURNAL OF SYSTEMS AND SOFTWARE, 2021, 180
  • [6] Characterizing architecture related posts and their usefulness in Stack Overflow
    Dieu, Musengamana Jean de
    Liang, Peng
    Shahin, Mojtaba
    Khan, Arif Ali
    JOURNAL OF SYSTEMS AND SOFTWARE, 2023, 198
  • [7] Why is Developing Machine Learning Applications Challenging? A Study on Stack Overflow Posts
    Alshangiti, Moayad
    Sapkota, Hitesh
    Murukannaiah, Pradeep K.
    Liu, Xumin
    Yu, Qi
    2019 13TH ACM/IEEE INTERNATIONAL SYMPOSIUM ON EMPIRICAL SOFTWARE ENGINEERING AND MEASUREMENT (ESEM 2019), 2019, : 117 - 127
  • [8] What causes exceptions in machine learning applications? Mining machine learning-related stack traces on Stack Overflow
    Ghadesi, Amin
    Lamothe, Maxime
    Li, Heng
    EMPIRICAL SOFTWARE ENGINEERING, 2024, 29 (05)
  • [9] CAPS: a supervised technique for classifying Stack Overflow posts concerning API issues
    Md Ahasanuzzaman
    Muhammad Asaduzzaman
    Chanchal K. Roy
    Kevin A. Schneider
    Empirical Software Engineering, 2020, 25 : 1493 - 1532
  • [10] Understanding the Topics and Challenges of GPU Programming by Classifying and Analyzing Stack Overflow Posts
    Yang, Wenhua
    Zhang, Chong
    Pan, Minxue
    PROCEEDINGS OF THE 31ST ACM JOINT MEETING EUROPEAN SOFTWARE ENGINEERING CONFERENCE AND SYMPOSIUM ON THE FOUNDATIONS OF SOFTWARE ENGINEERING, ESEC/FSE 2023, 2023, : 1444 - 1456