Source Code Retrieval on StackOverflow Using LDA

被引:0
|
作者
Arwan, Achmad [1 ]
Rochimah, Siti [2 ]
Akbar, Rizky Januar [2 ]
机构
[1] Univ Brawijaya, Fac Comp Sci, Dept Informat, Malang, Indonesia
[2] Inst Teknol Sepuluh Nopember, Fac Informat Technol, Dept Informat, Surabaya, Indonesia
关键词
Source Code Searching; Concept Location; Latent Dirichlet Allocation;
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Internet code search is quite popular research area. StackOverflow allows developers to ask and answer questions about code. Previous approach to search code on StackOverflow uses tf-idf method that based on number of occurrences of words to recommend source code. This method has the disadvantage that variable or method identifiers are considered as normal words, even though identifiers are often a combination of two or more words. For example, there is an identifier named "randomString". In that case, if we search using a keyword "random" the system probably will not recommend "randomString" because both words are different. Concept location can tackle this problem. Concept location has been used widely to obtain the correlation between code with a specific concepts or features. Previous research of concept location only focused on source code's comments, and relation among the objects within the source code. This research proposes a mechanism for finding code on StackOverflow uses Latent Dirichlet Allocation ( LDA) using concept location in the preprocessing stage. Questions, answers, and code snippets about Java programming are downloaded from StackOverflow to a local repository. Corpuses are generated by extracting questions, answers and code snippets. Inferencing concept location from source code is created using LDA algorithm. Developers query concepts and then system will recommend source code based on the relevant concepts. The result of the experiment shows that the system is able to recommend source code with 48% average of precision and 58% average of recall.
引用
收藏
页码:295 / 299
页数:5
相关论文
共 50 条
  • [1] Source Code Curation on StackOverflow: The Vesperin System
    Sanchez, Huascar
    Whitehead, Jim
    [J]. 2015 IEEE/ACM 37TH IEEE INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING, VOL 2, 2015, : 661 - 664
  • [2] Source code analysis with LDA
    Binkley, David
    Heinz, Daniel
    Lawrie, Dawn
    Overfelt, Justin
    [J]. JOURNAL OF SOFTWARE-EVOLUTION AND PROCESS, 2016, 28 (10) : 893 - 920
  • [3] Retrieval on Source Code: A Neural Code Search
    Sachdev, Saksham
    Li, Hongyu
    Luan, Sifei
    Kim, Seohyun
    Sen, Koushik
    Chandra, Satish
    [J]. MAPL'18: PROCEEDINGS OF THE 2ND ACM SIGPLAN INTERNATIONAL WORKSHOP ON MACHINE LEARNING AND PROGRAMMING LANGUAGES, 2018, : 31 - 41
  • [4] Stackoverflow tag prediction using tag associations and code analysis
    Singh, Prabhnoor
    Chopra, Rajkanwar
    Sharma, Ojasvi
    Singla, Rekha
    [J]. JOURNAL OF DISCRETE MATHEMATICAL SCIENCES & CRYPTOGRAPHY, 2020, 23 (01): : 35 - 43
  • [5] STraceBERT: Source Code Retrieval using Semantic Application Traces
    Spiess, Claudio
    [J]. PROCEEDINGS OF THE 31ST ACM JOINT MEETING EUROPEAN SOFTWARE ENGINEERING CONFERENCE AND SYMPOSIUM ON THE FOUNDATIONS OF SOFTWARE ENGINEERING, ESEC/FSE 2023, 2023, : 2207 - 2209
  • [6] Source Code Retrieval for Bug Localization using Bug Report
    Swe, Kyaw Ei Ei
    Oo, Hnin Min
    [J]. 2019 IEEE 15TH INTERNATIONAL CONFERENCE ON INTELLIGENT COMPUTER COMMUNICATION AND PROCESSING (ICCP 2019), 2019, : 241 - 247
  • [7] Source Code Retrieval for Bug Localization using Latent Dirichlet Allocation
    Lukins, Stacy K.
    Kraft, Nicholas A.
    Etzkorn, Letha H.
    [J]. FIFTEENTH WORKING CONFERENCE ON REVERSE ENGINEERING, PROCEEDINGS, 2008, : 155 - 164
  • [8] SnipMatch: Using Source Code Context to Enhance Snippet Retrieval and Parameterization
    Wightman, Doug
    Ye, Zi
    Brandt, Joel
    Vertegaal, Roel
    [J]. UIST'12: PROCEEDINGS OF THE 25TH ANNUAL ACM SYMPOSIUM ON USER INTERFACE SOFTWARE AND TECHNOLOGY, 2012, : 219 - 228
  • [9] ANNE: Improving Source Code Search using Entity Retrieval Approach
    Vinayakarao, Venkatesh
    Sarma, Anita
    Purandare, Rahul
    Jain, Shuktika
    Jain, Saumya
    [J]. WSDM'17: PROCEEDINGS OF THE TENTH ACM INTERNATIONAL CONFERENCE ON WEB SEARCH AND DATA MINING, 2017, : 211 - 220
  • [10] Improving Software Text Retrieval using Conceptual Knowledge in Source Code
    Lin, Zeqi
    Zou, Yanzhen
    Zhao, Junfeng
    Xie, Bing
    [J]. PROCEEDINGS OF THE 2017 32ND IEEE/ACM INTERNATIONAL CONFERENCE ON AUTOMATED SOFTWARE ENGINEERING (ASE'17), 2017, : 123 - 134