Cross-Modal Contrastive Learning for Code Search

被引:2
|
作者
Shi, Zejian [1 ]
Xiong, Yun [1 ,2 ]
Zhang, Xiaolong [1 ]
Zhang, Yao [1 ]
Li, Shanshan [3 ]
Zhu, Yangyong [1 ]
机构
[1] Fudan Univ, Sch Comp Sci, Shanghai Key Lab Data Sci, Shanghai, Peoples R China
[2] Peng Cheng Lab, Shenzhen, Peoples R China
[3] Natl Univ Def Technol, Sch Comp, Changsha, Peoples R China
基金
中国国家自然科学基金;
关键词
code search; code representation; data augmentation; contrastive learning;
D O I
10.1109/ICSME55016.2022.00017
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Code search aims to retrieve code snippets from natural language queries, which serves as a core technology to improve development efficiency. Previous approaches have achieved promising results to learn code and query representations by using BERT-based pre-trained models which, however, leads to semantic collapse problems, i.e. native representations of code and query clustering in a high similarity interval. In this paper, we propose CrossCS, a cross-modal contrastive learning method for code search, to improve the representations of code and query by explicit fine-grained contrastive objectives. Specifically, we design a novel and effective contrastive objective that considers not only the similarity between modalities, but also the similarity within modalities. To maintain semantic consistency of code snippets with different names of functions and variables, we use data augmentation to rename functions and variables to meaningless tokens, which enables us to add comparisons between code and augmented code within modalities. Moreover, in order to further improve the effectiveness of pre-trained models, we rank candidate code snippets using similarity scores weighted by retrieval scores and classification scores. Comprehensive experiments demonstrate that our method can significantly improve the effectiveness of pre-trained models for code search.
引用
收藏
页码:94 / 105
页数:12
相关论文
共 50 条
  • [1] Cross-modal Contrastive Learning for Speech Translation
    Ye, Rong
    Wang, Mingxuan
    Li, Lei
    [J]. NAACL 2022: THE 2022 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES, 2022, : 5099 - 5113
  • [2] Cross-modal contrastive learning for multimodal sentiment recognition
    Yang, Shanliang
    Cui, Lichao
    Wang, Lei
    Wang, Tao
    [J]. APPLIED INTELLIGENCE, 2024, 54 (05) : 4260 - 4276
  • [3] Cross-Modal Graph Contrastive Learning with Cellular Images
    Zheng, Shuangjia
    Rao, Jiahua
    Zhang, Jixian
    Zhou, Lianyu
    Xie, Jiancong
    Cohen, Ethan
    Lu, Wei
    Li, Chengtao
    Yang, Yuedong
    [J]. ADVANCED SCIENCE, 2024, 11 (32)
  • [4] TRAJCROSS: Trajecotry Cross-Modal Retrieval with Contrastive Learning
    Jing, Quanliang
    Yao, Di
    Gong, Chang
    Fan, Xinxin
    Wang, Baoli
    Tan, Haining
    Bi, Jingping
    [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2021, : 344 - 349
  • [5] Cross-modal contrastive learning for multimodal sentiment recognition
    Shanliang Yang
    Lichao Cui
    Lei Wang
    Tao Wang
    [J]. Applied Intelligence, 2024, 54 : 4260 - 4276
  • [6] Cross-modal contrastive learning for aspect-based recommendation
    Won, Heesoo
    Oh, Byungkook
    Yang, Hyeongjun
    Lee, Kyong-Ho
    [J]. INFORMATION FUSION, 2023, 99
  • [7] Cross-Modal Contrastive Learning for Text-to-Image Generation
    Zhang, Han
    Koh, Jing Yu
    Baldridge, Jason
    Lee, Honglak
    Yang, Yinfei
    [J]. 2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 833 - 842
  • [8] Cross-modal Contrastive Learning for Multimodal Fake News Detection
    Wang, Longzheng
    Zhang, Chuang
    Xu, Hongbo
    Xu, Yongxiu
    Xu, Xiaohan
    Wang, Siqi
    [J]. PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 5696 - 5704
  • [9] Improving Spoken Language Understanding with Cross-Modal Contrastive Learning
    Dong, Jingjing
    Fu, Jiayi
    Zhou, Peng
    Li, Hao
    Wang, Xiaorui
    [J]. INTERSPEECH 2022, 2022, : 2693 - 2697
  • [10] Enriched Music Representations With Multiple Cross-Modal Contrastive Learning
    Ferraro, Andres
    Favory, Xavier
    Drossos, Konstantinos
    Kim, Yuntae
    Bogdanov, Dmitry
    [J]. IEEE SIGNAL PROCESSING LETTERS, 2021, 28 : 733 - 737