Source Code Retrieval on StackOverflow Using LDA

被引:0
|
作者
Arwan, Achmad [1 ]
Rochimah, Siti [2 ]
Akbar, Rizky Januar [2 ]
机构
[1] Univ Brawijaya, Fac Comp Sci, Dept Informat, Malang, Indonesia
[2] Inst Teknol Sepuluh Nopember, Fac Informat Technol, Dept Informat, Surabaya, Indonesia
关键词
Source Code Searching; Concept Location; Latent Dirichlet Allocation;
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Internet code search is quite popular research area. StackOverflow allows developers to ask and answer questions about code. Previous approach to search code on StackOverflow uses tf-idf method that based on number of occurrences of words to recommend source code. This method has the disadvantage that variable or method identifiers are considered as normal words, even though identifiers are often a combination of two or more words. For example, there is an identifier named "randomString". In that case, if we search using a keyword "random" the system probably will not recommend "randomString" because both words are different. Concept location can tackle this problem. Concept location has been used widely to obtain the correlation between code with a specific concepts or features. Previous research of concept location only focused on source code's comments, and relation among the objects within the source code. This research proposes a mechanism for finding code on StackOverflow uses Latent Dirichlet Allocation ( LDA) using concept location in the preprocessing stage. Questions, answers, and code snippets about Java programming are downloaded from StackOverflow to a local repository. Corpuses are generated by extracting questions, answers and code snippets. Inferencing concept location from source code is created using LDA algorithm. Developers query concepts and then system will recommend source code based on the relevant concepts. The result of the experiment shows that the system is able to recommend source code with 48% average of precision and 58% average of recall.
引用
收藏
页码:295 / 299
页数:5
相关论文
共 50 条
  • [21] CERBERUS Tracing requirements to source code using information retrieval, dynamic analysis, and program analysis
    Eaddy, Marc
    Aho, Alfred V.
    Antoniol, Giuliano
    Gueheneuc, Yann-Gael
    [J]. PROCEEDINGS OF THE 16TH IEEE INTERNATIONAL CONFERENCE ON PROGRAM COMPREHENSION, 2008, : 53 - 62
  • [22] Exploiting spatial code proximity and order for improved source code retrieval for bug localization
    Sisman, Bunyamin
    Akbar, Shayan A.
    Kak, Avinash C.
    [J]. JOURNAL OF SOFTWARE-EVOLUTION AND PROCESS, 2017, 29 (01)
  • [23] Code Component Retrieval Using Code2Vec
    RamyaSree, B.
    Ramakrishna, Bajjuri
    Harshitha, M., I
    Kavya, Amma
    Reshvanth, Paladugu
    Rao, N. V. Krishna
    [J]. PROCEEDINGS OF THE 2021 FIFTH INTERNATIONAL CONFERENCE ON I-SMAC (IOT IN SOCIAL, MOBILE, ANALYTICS AND CLOUD) (I-SMAC 2021), 2021, : 1044 - 1048
  • [24] Labeling source code with information retrieval methods: an empirical study
    Andrea De Lucia
    Massimiliano Di Penta
    Rocco Oliveto
    Annibale Panichella
    Sebastiano Panichella
    [J]. Empirical Software Engineering, 2014, 19 : 1383 - 1420
  • [25] Application of Information Retrieval Techniques for Source Code Authorship Attribution
    Burrows, Steven
    Uitdenbogerd, Alexandra L.
    Turpin, Andrew
    [J]. DATABASE SYSTEMS FOR ADVANCED APPLICATIONS, PROCEEDINGS, 2009, 5463 : 699 - 713
  • [26] Improving Source Code Lexicon via Traceability and Information Retrieval
    De Lucia, Andrea
    Di Penta, Massimiliano
    Oliveto, Rocco
    [J]. IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 2011, 37 (02) : 205 - 227
  • [27] Modeling Source Code to Support Retrieval-Based Applications
    Vinayakarao, Venkatesh
    [J]. WSDM'17: PROCEEDINGS OF THE TENTH ACM INTERNATIONAL CONFERENCE ON WEB SEARCH AND DATA MINING, 2017, : 833 - 833
  • [28] Searching program source code with a structured text retrieval system
    Clarke, C
    Cox, A
    Sim, S
    [J]. SIGIR'99: PROCEEDINGS OF 22ND INTERNATIONAL CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 1999, : 307 - 308
  • [29] Labeling source code with information retrieval methods: an empirical study
    De Lucia, Andrea
    Di Penta, Massimiliano
    Oliveto, Rocco
    Panichella, Annibale
    Panichella, Sebastiano
    [J]. EMPIRICAL SOFTWARE ENGINEERING, 2014, 19 (05) : 1383 - 1420
  • [30] Scalable Source Code Plagiarism Detection Using Source Code Vectors Clustering
    Duracik, Michal
    Krsak, Emil
    Hrkut, Patrik
    [J]. PROCEEDINGS OF 2018 IEEE 9TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING AND SERVICE SCIENCE (ICSESS), 2018, : 499 - 502