From Word Embeddings To Document Similarities for Improved Information Retrieval in Software Engineering

被引：203

作者：

Ye, Xin ^{[1
]}

Shen, Hui ^{[1
]}

Ma, Xiao ^{[1
]}

Bunescu, Razvan ^{[1
]}

Liu, Chang ^{[1
]}

机构：

[1] Ohio Univ, Sch Elect Engn & Comp Sci, Athens, OH 45701 USA

来源：

2016 IEEE/ACM 38TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING (ICSE) | 2016年

基金：

美国国家科学基金会;

关键词：

Word embeddings; skip-gram model; bug localization; bug reports; API documents;

D O I：

10.1145/2884781.2884862

中图分类号：

TP31 [计算机软件];

学科分类号：

081202 ; 0835 ;

摘要：

The application of information retrieval techniques to search tasks in software engineering is made difficult by the lexical gap between search queries, usually expressed in natural language (e.g. English), and retrieved documents, usually expressed in code (e.g. programming languages). This is often the case in bug and feature location, community question answering, or more generally the communication between technical personnel and non-technical stake holders in a software project. In this paper, we propose bridging the lexical gap by projecting natural language statements and code snippets as meaning vectors in a shared representation space. In the proposed architecture, word embeddings are first trained on API documents, tutorials, and reference documents, and then aggregated in order to estimate semantic similarities between documents. Empirical evaluations show that the learned vector space embeddings lead to improvements in a previously explored bug localization task and a newly defined task of linking API documents to computer programming questions.

引用

下载

页码：404 / 415

页数：12

共 50 条

[21] Retrieval Of Information In Document Image Databases Using Partial Word Image Matching Technique
Yadav, Seema
Sawarkar, Sudhir
2009 IEEE INTERNATIONAL ADVANCE COMPUTING CONFERENCE, VOLS 1-3, 2009, : 552 - 557
[22] Retrieval Of Information In Document Image Databases Using Partial Word Image Matching Technique
Yadav, Seema
Sawarkar, Sudhir
IMECS 2009: INTERNATIONAL MULTI-CONFERENCE OF ENGINEERS AND COMPUTER SCIENTISTS, VOLS I AND II, 2009, : 902 - +
[23] LATENT TOPIC MODELING OF WORD CO-OCCURRENCE INFORMATION FOR SPOKEN DOCUMENT RETRIEVAL
Chen, Berlin
2009 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1- 8, PROCEEDINGS, 2009, : 3961 - 3964
[24] Large-scale information retrieval in software engineering - an experience report from industrial application
Unterkalmsteiner, Michael
Gorschek, Tony
Feldt, Robert
Lavesson, Niklas
EMPIRICAL SOFTWARE ENGINEERING, 2016, 21 (06) : 2324 - 2365
[25] Large-scale information retrieval in software engineering - an experience report from industrial application
Michael Unterkalmsteiner
Tony Gorschek
Robert Feldt
Niklas Lavesson
Empirical Software Engineering, 2016, 21 : 2324 - 2365
[26] Toward Optimal Selection of Information Retrieval Models for Software Engineering Tasks
Rahman, Md Masudur
Chakraborty, Saikat
Kaiser, Gail
Ray, Baishakhi
2019 19TH IEEE INTERNATIONAL WORKING CONFERENCE ON SOURCE CODE ANALYSIS AND MANIPULATION (SCAM), 2019, : 127 - 138
[27] Text retrieval from document images based on word shape analysis
Tan, CL
Huang, WH
Sung, SY
Yu, ZH
Xu, Y
APPLIED INTELLIGENCE, 2003, 18 (03) : 257 - 270
[28] Configuring and Assembling Information Retrieval based Solutions for Software Engineering Tasks
Dit, Bogdan
32ND IEEE INTERNATIONAL CONFERENCE ON SOFTWARE MAINTENANCE AND EVOLUTION (ICSME 2016), 2016, : 641 - 646
[29] Text Retrieval from Document Images Based on Word Shape Analysis
Chew Lim Tan
Weihua Huang
Sam Yuan Sung
Zhaohui Yu
Yi Xu
Applied Intelligence, 2003, 18 : 257 - 270
[30] XML information retrieval from spoken word archives
Aly, Robin
Hiemstra, Djoerd
Ordelman, Roeland
van der Werff, Laurens
de Jong, Franciska
EVALUATION OF MULTILINGUAL AND MULTI-MODAL INFORMATION RETRIEVAL, 2007, 4730 : 770 - +

← 1 2 3 4 5 →