Topic modeling in software engineering research

被引:0
|
作者
Camila Costa Silva
Matthias Galster
Fabian Gilson
机构
[1] University of Canterbury,
来源
关键词
Topic modeling; Text mining; Natural language processing; Literature analysis;
D O I
暂无
中图分类号
学科分类号
摘要
Topic modeling using models such as Latent Dirichlet Allocation (LDA) is a text mining technique to extract human-readable semantic “topics” (i.e., word clusters) from a corpus of textual documents. In software engineering, topic modeling has been used to analyze textual data in empirical studies (e.g., to find out what developers talk about online), but also to build new techniques to support software engineering tasks (e.g., to support source code comprehension). Topic modeling needs to be applied carefully (e.g., depending on the type of textual data analyzed and modeling parameters). Our study aims at describing how topic modeling has been applied in software engineering research with a focus on four aspects: (1) which topic models and modeling techniques have been applied, (2) which textual inputs have been used for topic modeling, (3) how textual data was “prepared” (i.e., pre-processed) for topic modeling, and (4) how generated topics (i.e., word clusters) were named to give them a human-understandable meaning. We analyzed topic modeling as applied in 111 papers from ten highly-ranked software engineering venues (five journals and five conferences) published between 2009 and 2020. We found that (1) LDA and LDA-based techniques are the most frequent topic modeling techniques, (2) developer communication and bug reports have been modelled most, (3) data pre-processing and modeling parameters vary quite a bit and are often vaguely reported, and (4) manual topic naming (such as deducting names based on frequent words in a topic) is common.
引用
收藏
相关论文
共 50 条
  • [1] Topic modeling in software engineering research
    Silva, Camila Costa
    Galster, Matthias
    Gilson, Fabian
    [J]. EMPIRICAL SOFTWARE ENGINEERING, 2021, 26 (06)
  • [2] Detecting Latent Topics and Trends in Software Engineering Research Since 1980 Using Probabilistic Topic Modeling
    Gurcan, Fatih
    Dalveren, Gonca Gokce Menekse
    Cagiltay, Nergiz Ercil
    Soylu, Ahmet
    [J]. IEEE ACCESS, 2022, 10 : 74638 - 74654
  • [3] AN INNOVATIVE TOPIC IN SOFTWARE: FREE SOFTWARE ENGINEERING
    Callejas Cuervo, Mauro
    [J]. INGENIERIA, 2005, 10 (02): : 79 - 86
  • [4] The twenty-first century of structural engineering research: A topic modeling approach
    Xie, Yazhou
    Ning, Chunxiao
    Sun, Lijun
    [J]. STRUCTURES, 2022, 35 : 577 - 590
  • [5] Modeling in software engineering
    Atlee, Joanne M.
    France, Robert
    Georg, Geri
    Moreira, Ana
    Rumpe, Bernhard
    Zschaler, Steffen
    [J]. 29TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING: ICSE 2007 COMPANION VOLUME, PROCEEDINGS, 2007, : 113 - +
  • [6] Modeling in Software Engineering
    Atlee, Joanne M.
    France, Robert
    Georg, Geri
    Moreira, Ana
    [J]. ICSE'08 PROCEEDINGS OF THE THIRTIETH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING, 2008, : 1039 - 1040
  • [7] What is wrong with topic modeling? And how to fix it using search-based software engineering
    Agrawal, Amritanshu
    Fu, Wei
    Menzies, Tim
    [J]. INFORMATION AND SOFTWARE TECHNOLOGY, 2018, 98 : 74 - 88
  • [8] FuzzyTM: a Software Package for Fuzzy Topic Modeling
    Rijcken, Emil
    Mosteiro, Pablo
    Zervanou, Kalliopi
    Spruit, Marco
    Scheepers, Floortje
    Kaymak, Uzay
    [J]. 2022 IEEE INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS (FUZZ-IEEE), 2022,
  • [9] Software, Software Engineering and Software Engineering Research: Some Unconventional Thoughts
    David Notkin
    [J]. Journal of Computer Science and Technology, 2009, 24 : 189 - 197
  • [10] Software, Software Engineering and Software Engineering Research: Some Unconventional Thoughts
    Notkin, David
    [J]. JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY, 2009, 24 (02): : 189 - 197