Two-Stage Attention-Based Model for Code Search with Textual and Structural Features

被引：31

作者：

Xu, Ling ^{[1
,2
]}

Yang, Huanhuan ^{[1
,2
]}

Liu, Chao ^{[3
]}

Shuai, Jianhang ^{[1
,2
]}

Yan, Meng ^{[1
,2
]}

Lei, Yan ^{[1
,2
]}

Xu, Zhou ^{[1
,2
]}

机构：

[1] Chongqing Univ, Minist Educ, Key Lab Dependable Serv Comp Cyber Phys Soc, Chongqing, Peoples R China

[2] Chongqing Univ, Sch Big Data & Software Engn, Chongqing, Peoples R China

[3] Zhejiang Univ, Coll Comp Sci & Technol, Hangzhou, Peoples R China

来源：

2021 IEEE INTERNATIONAL CONFERENCE ON SOFTWARE ANALYSIS, EVOLUTION AND REENGINEERING (SANER 2021) | 2021年

关键词：

code search; attention mechanism; representation learning; code structural feature;

D O I：

10.1109/SANER50967.2021.00039

中图分类号：

TP31 [计算机软件];

学科分类号：

081202 ; 0835 ;

摘要：

Searching and reusing existing code from a large scale codebase can largely improve developers' programming efficiency. To support code reuse, early code search models leverage information retrieval (IR) techniques to index a large-scale code corpus and return relevant code according to developers' search query. However, IR-based models fail to capture the semantics in code and query. To tackle this issue, developers applied deep learning (DL) techniques to code search models. However, these models either are too complex to determine an effective method efficiently or learning for semantic correlation between code and query inadequately. To bridge the semantic gap between code and query effectively and efficiently, we propose a code search model TabCS (Two-stage Attention-Based model for Code Search) in this study. TabCS extracts code and query information from the code textual features (i.e., method name, API sequence, and tokens), the code structural feature (i.e., abstract syntax tree), and the query feature (i.e., tokens). TabCS performs a two-stage attention network structure. The first stage leverages attention mechanisms to extract semantics from code and query considering their semantic gap. The second stage leverages a co-attention mechanism to capture their semantic correlation and learn better code/query representation. We evaluate the performance of TabCS on two existing large-scale datasets with 485k and 542k code snippets, respectively. Experimental results show that TabCS achieves an MRR of 0.57 on Hu et al.'s dataset, outperforming three state-of-the-art models CARLCS-CNN, DeepCS, and UNIF by 18%, 70%, 12%, respectively. Meanwhile, TabCS gains an MRR of 0.54 on Husain et al.'s, outperforming CARLCS-CNN, DeepCS, and UNIF by 32%, 76%, 29%, respectively.

引用

页码：342 / 353

页数：12

共 50 条

[1] Query-oriented two-stage attention-based model for code search
Yang, Huanhuan
Xu, Ling
Liu, Chao
Huangfu, Luwen
[J]. JOURNAL OF SYSTEMS AND SOFTWARE, 2024, 210
[2] Attention-Based Two-Stage U-Net Horizon Tracking
Luo, Yiliang
Zhang, Gulan
Li, Lei
Zhang, Xudong
Duan, Jing
Li, Xiangwen
Li, Yong
[J]. IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2022, 19
[3] DialCSP: A Two-Stage Attention-Based Model for Customer Satisfaction Prediction in E-commerce Customer Service
Wu, Zhenhe
Wu, Liangqing
Song, Shuangyong
Ji, Jiahao
Zou, Bo
Li, Zhoujun
He, Xiaodong
[J]. MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, ECML PKDD 2022, PT III, 2023, 13715 : 3 - 18
[4] TAFFNet: Two-Stage Attention-Based Feature Fusion Network for Surface Defect Detection
Cao, Jingang
Yang, Guotian
Yang, Xiyun
[J]. JOURNAL OF SIGNAL PROCESSING SYSTEMS FOR SIGNAL IMAGE AND VIDEO TECHNOLOGY, 2022, 94 (12): : 1531 - 1544
[5] TAFFNet: Two-Stage Attention-Based Feature Fusion Network for Surface Defect Detection
Jingang Cao
Guotian Yang
Xiyun Yang
[J]. Journal of Signal Processing Systems, 2022, 94 : 1531 - 1544
[6] An Attention-based Regression Model for Grounding Textual Phrases in Images
Endo, Ko
Aono, Masaki
Nichols, Eric
Funakoshi, Kotaro
[J]. PROCEEDINGS OF THE TWENTY-SIXTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2017, : 3995 - 4001
[7] A Two-Stage Attention-Based Hierarchical Transformer for Turbofan Engine Remaining Useful Life Prediction
Fan, Zhengyang
Li, Wanru
Chang, Kuo-Chu
[J]. SENSORS, 2024, 24 (03)
[8] A Two-Stage Model for Blog Feed Search
Weerkamp, Wouter
Balog, Krisztian
de Rijke, Maarten
[J]. SIGIR 2010: PROCEEDINGS OF THE 33RD ANNUAL INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH DEVELOPMENT IN INFORMATION RETRIEVAL, 2010, : 877 - 878
[9] ABOS: an attention-based one-stage framework for person search
Yuqi Chen
Dezhi Han
Mingming Cui
Zhongdai Wu
Chin-Chen Chang
[J]. EURASIP Journal on Wireless Communications and Networking, 2022
[10] ABOS: an attention-based one-stage framework for person search
Chen, Yuqi
Han, Dezhi
Cui, Mingming
Wu, Zhongdai
Chang, Chin-Chen
[J]. EURASIP JOURNAL ON WIRELESS COMMUNICATIONS AND NETWORKING, 2022, 2022 (01)

← 1 2 3 4 5 →