From Word Embeddings To Document Similarities for Improved Information Retrieval in Software Engineering

被引：203

作者：

Ye, Xin ^{[1
]}

Shen, Hui ^{[1
]}

Ma, Xiao ^{[1
]}

Bunescu, Razvan ^{[1
]}

Liu, Chang ^{[1
]}

机构：

[1] Ohio Univ, Sch Elect Engn & Comp Sci, Athens, OH 45701 USA

来源：

2016 IEEE/ACM 38TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING (ICSE) | 2016年

基金：

美国国家科学基金会;

关键词：

Word embeddings; skip-gram model; bug localization; bug reports; API documents;

D O I：

10.1145/2884781.2884862

中图分类号：

TP31 [计算机软件];

学科分类号：

081202 ; 0835 ;

摘要：

The application of information retrieval techniques to search tasks in software engineering is made difficult by the lexical gap between search queries, usually expressed in natural language (e.g. English), and retrieved documents, usually expressed in code (e.g. programming languages). This is often the case in bug and feature location, community question answering, or more generally the communication between technical personnel and non-technical stake holders in a software project. In this paper, we propose bridging the lexical gap by projecting natural language statements and code snippets as meaning vectors in a shared representation space. In the proposed architecture, word embeddings are first trained on API documents, tutorials, and reference documents, and then aggregated in order to estimate semantic similarities between documents. Empirical evaluations show that the learned vector space embeddings lead to improvements in a previously explored bug localization task and a newly defined task of linking API documents to computer programming questions.

引用

页码：404 / 415

页数：12

共 50 条

[31] An approach for document fragment retrieval and its formatting issue in engineering information management
Liu, Shaofeng
McMahon, Chris A.
Darlington, Mansur J.
Culley, Steve J.
Wild, Peter J.
[J]. COMPUTATIONAL SCIENCE AND ITS APPLICATIONS - ICCSA 2006, PT 2, 2006, 3981 : 279 - 287
[32] A review of structured document retrieval (SDR) technology to improve information access performance in engineering document management
Liu, S.
McMahon, C. A.
Culley, S. J.
[J]. COMPUTERS IN INDUSTRY, 2008, 59 (01) : 3 - 16
[33] Inferring Multilingual Domain-Specific Word Embeddings From Large Document Corpora
Cagliero, Luca
La Quatra, Moreno
[J]. IEEE ACCESS, 2021, 9 : 137309 - 137321
[34] Profile based information retrieval from printed document images
Abirami, S.
Manjula, D.
[J]. COMPUTER GRAPHICS, IMAGING AND VISUALISATION: NEW ADVANCES, 2007, : 268 - +
[35] DocBrowse: A system for information retrieval from document image data
Jaisimha, MY
Brue, A
Nguyen, T
[J]. STORAGE AND RETRIEVAL FOR STILL IMAGE AND VIDEO DATABASES IV, 1996, 2670 : 350 - 361
[36] Document clustering for efficient and secure information retrieval from cloud
Handa, Rohit
Krishna, C. Rama
Aggarwal, Naveen
[J]. CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2019, 31 (15):
[37] A computational framework for retrieval of document fragments based on decomposition schemes in engineering information management
Liu, S.
McMahon, C. A.
Darlington, M. J.
Culley, S. J.
Wild, P. J.
[J]. ADVANCED ENGINEERING INFORMATICS, 2006, 20 (04) : 401 - 413
[38] Word Retrieval from Kannada Document Images Using HOG and Morphological Features
Hangarge, Mallikarjun
Veershetty, C.
Rajmohan, P.
Mukarambi, Gururaj
[J]. RECENT TRENDS IN IMAGE PROCESSING AND PATTERN RECOGNITION (RTIP2R 2016), 2017, 709 : 71 - 79
[39] Learning semantic information from Internet Domain Names using word embeddings
Lopez, Waldemar
Merlino, Jorge
Rodriguez-Bocca, Pablo
[J]. ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2020, 94
[40] A resolving of word sense ambiguity using two-level document ranking method in information retrieval
Kang, Hyun-Kyu
Jeon, Heung Seok
Ko, Myeong-Cheol
Kim, Jin Soo
Yang, Kiduk
[J]. 2007 INTERNATIONAL SYMPOSIUM ON INFORMATION TECHNOLOGY CONVERGENCE, PROCEEDINGS, 2007, : 315 - +

← 1 2 3 4 5 →