Two-Stage Attention-Based Model for Code Search with Textual and Structural Features

被引:31
|
作者
Xu, Ling [1 ,2 ]
Yang, Huanhuan [1 ,2 ]
Liu, Chao [3 ]
Shuai, Jianhang [1 ,2 ]
Yan, Meng [1 ,2 ]
Lei, Yan [1 ,2 ]
Xu, Zhou [1 ,2 ]
机构
[1] Chongqing Univ, Minist Educ, Key Lab Dependable Serv Comp Cyber Phys Soc, Chongqing, Peoples R China
[2] Chongqing Univ, Sch Big Data & Software Engn, Chongqing, Peoples R China
[3] Zhejiang Univ, Coll Comp Sci & Technol, Hangzhou, Peoples R China
关键词
code search; attention mechanism; representation learning; code structural feature;
D O I
10.1109/SANER50967.2021.00039
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Searching and reusing existing code from a large scale codebase can largely improve developers' programming efficiency. To support code reuse, early code search models leverage information retrieval (IR) techniques to index a large-scale code corpus and return relevant code according to developers' search query. However, IR-based models fail to capture the semantics in code and query. To tackle this issue, developers applied deep learning (DL) techniques to code search models. However, these models either are too complex to determine an effective method efficiently or learning for semantic correlation between code and query inadequately. To bridge the semantic gap between code and query effectively and efficiently, we propose a code search model TabCS (Two-stage Attention-Based model for Code Search) in this study. TabCS extracts code and query information from the code textual features (i.e., method name, API sequence, and tokens), the code structural feature (i.e., abstract syntax tree), and the query feature (i.e., tokens). TabCS performs a two-stage attention network structure. The first stage leverages attention mechanisms to extract semantics from code and query considering their semantic gap. The second stage leverages a co-attention mechanism to capture their semantic correlation and learn better code/query representation. We evaluate the performance of TabCS on two existing large-scale datasets with 485k and 542k code snippets, respectively. Experimental results show that TabCS achieves an MRR of 0.57 on Hu et al.'s dataset, outperforming three state-of-the-art models CARLCS-CNN, DeepCS, and UNIF by 18%, 70%, 12%, respectively. Meanwhile, TabCS gains an MRR of 0.54 on Husain et al.'s, outperforming CARLCS-CNN, DeepCS, and UNIF by 32%, 76%, 29%, respectively.
引用
收藏
页码:342 / 353
页数:12
相关论文
共 50 条
  • [1] Query-oriented two-stage attention-based model for code search
    Yang, Huanhuan
    Xu, Ling
    Liu, Chao
    Huangfu, Luwen
    [J]. JOURNAL OF SYSTEMS AND SOFTWARE, 2024, 210
  • [2] Attention-Based Two-Stage U-Net Horizon Tracking
    Luo, Yiliang
    Zhang, Gulan
    Li, Lei
    Zhang, Xudong
    Duan, Jing
    Li, Xiangwen
    Li, Yong
    [J]. IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2022, 19
  • [3] DialCSP: A Two-Stage Attention-Based Model for Customer Satisfaction Prediction in E-commerce Customer Service
    Wu, Zhenhe
    Wu, Liangqing
    Song, Shuangyong
    Ji, Jiahao
    Zou, Bo
    Li, Zhoujun
    He, Xiaodong
    [J]. MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, ECML PKDD 2022, PT III, 2023, 13715 : 3 - 18
  • [4] TAFFNet: Two-Stage Attention-Based Feature Fusion Network for Surface Defect Detection
    Cao, Jingang
    Yang, Guotian
    Yang, Xiyun
    [J]. JOURNAL OF SIGNAL PROCESSING SYSTEMS FOR SIGNAL IMAGE AND VIDEO TECHNOLOGY, 2022, 94 (12): : 1531 - 1544
  • [5] TAFFNet: Two-Stage Attention-Based Feature Fusion Network for Surface Defect Detection
    Jingang Cao
    Guotian Yang
    Xiyun Yang
    [J]. Journal of Signal Processing Systems, 2022, 94 : 1531 - 1544
  • [6] An Attention-based Regression Model for Grounding Textual Phrases in Images
    Endo, Ko
    Aono, Masaki
    Nichols, Eric
    Funakoshi, Kotaro
    [J]. PROCEEDINGS OF THE TWENTY-SIXTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2017, : 3995 - 4001
  • [7] A Two-Stage Attention-Based Hierarchical Transformer for Turbofan Engine Remaining Useful Life Prediction
    Fan, Zhengyang
    Li, Wanru
    Chang, Kuo-Chu
    [J]. SENSORS, 2024, 24 (03)
  • [8] A Two-Stage Model for Blog Feed Search
    Weerkamp, Wouter
    Balog, Krisztian
    de Rijke, Maarten
    [J]. SIGIR 2010: PROCEEDINGS OF THE 33RD ANNUAL INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH DEVELOPMENT IN INFORMATION RETRIEVAL, 2010, : 877 - 878
  • [9] ABOS: an attention-based one-stage framework for person search
    Yuqi Chen
    Dezhi Han
    Mingming Cui
    Zhongdai Wu
    Chin-Chen Chang
    [J]. EURASIP Journal on Wireless Communications and Networking, 2022
  • [10] ABOS: an attention-based one-stage framework for person search
    Chen, Yuqi
    Han, Dezhi
    Cui, Mingming
    Wu, Zhongdai
    Chang, Chin-Chen
    [J]. EURASIP JOURNAL ON WIRELESS COMMUNICATIONS AND NETWORKING, 2022, 2022 (01)