Contrastive Code-Comment Pre-training

被引:0
|
作者
Pei, Xiaohuan [1 ]
Liu, Daochang [1 ]
Qian, Luo [1 ]
Xu, Chang [1 ]
机构
[1] Univ Sydney, Sch Comp Sci, Fac Engn, Sydney, Australia
基金
澳大利亚研究理事会;
关键词
contrastive learning; representation learning; pre-training; programming language processing;
D O I
10.1109/ICDM54844.2022.00050
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Pre-trained models for Natural Languages (NL) have been recently shown to transfer well to Programming Languages (PL) and largely benefit different intelligence coderelated tasks, such as code search, clone detection, programming translation and code document generation. However, existing pre-trained methods for programming languages are mainly conducted by masked language modeling and next sentence prediction at token or graph levels. This restricted form limits their performance and transferability since PL and NL have different syntax rules and the downstream tasks require a multimodal representation. Here we introduce C3P, a Contrastive Code-Comment Pre-training approach, to solve various downstream tasks by pre-training the multi-representation features on both programming and natural syntax. The model encodes the code syntax and natural language description (comment) by two encoders and the encoded embeddings are projected into a multi-modal space for learning the latent representation. In the latent space, C3P jointly trains the code and comment encoders by the symmetric loss function, which aims to maximize the cosine similarity of the correct code-comment pairs while minimizing the similarity of unrelated pairs. We verify the empirical performance of the proposed pre-trained models on multiple downstream code-related tasks. The comprehensive experiments demonstrate that C3P outperforms previous work on the understanding tasks of code search and code clone, as well as the generation tasks of programming translation and document generation. Furthermore, we validate the transferability of C3P to the new programming language which is not seen in the pre-training stage. The results show our model surpasses all supervised methods and in some programming language cases even outperforms prior pre-trained approaches. Code is available at https://github.com/TerryPei/C3P.
引用
收藏
页码:398 / 407
页数:10
相关论文
共 50 条
  • [1] Contrastive Pre-Training of GNNs on Heterogeneous Graphs
    Jiang, Xunqiang
    Lu, Yuanfu
    Fang, Yuan
    Shi, Chuan
    [J]. PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON INFORMATION & KNOWLEDGE MANAGEMENT, CIKM 2021, 2021, : 803 - 812
  • [2] Adversarial momentum-contrastive pre-training
    Xu, Cong
    Li, Dan
    Yang, Min
    [J]. PATTERN RECOGNITION LETTERS, 2022, 160 : 172 - 179
  • [3] Robust Pre-Training by Adversarial Contrastive Learning
    Jiang, Ziyu
    Chen, Tianlong
    Chen, Ting
    Wang, Zhangyang
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33
  • [4] Temporal Contrastive Pre-Training for Sequential Recommendation
    Tian, Changxin
    Lin, Zihan
    Bian, Shuqing
    Wang, Jinpeng
    Zhao, Wayne Xin
    [J]. PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, CIKM 2022, 2022, : 1925 - 1934
  • [5] New Intent Discovery with Pre-training and Contrastive Learning
    Zhang, Yuwei
    Zhang, Haode
    Zhan, Li-Ming
    Wu, Xiao-Ming
    Lam, Albert Y. S.
    [J]. PROCEEDINGS OF THE 60TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), VOL 1: (LONG PAPERS), 2022, : 256 - 269
  • [6] Contrastive Language-knowledge Graph Pre-training
    Yuan, Xiaowei
    Liu, Kang
    Wang, Yequan
    [J]. ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2024, 23 (04)
  • [7] Multi-Modal Contrastive Pre-training for Recommendation
    Liu, Zhuang
    Ma, Yunpu
    Schubert, Matthias
    Ouyang, Yuanxin
    Xiong, Zhang
    [J]. PROCEEDINGS OF THE 2022 INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL, ICMR 2022, 2022, : 99 - 108
  • [8] Image Difference Captioning with Pre-training and Contrastive Learning
    Yao, Linli
    Wang, Weiying
    Jin, Qin
    [J]. THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / THE TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 3108 - 3116
  • [9] CODE: Contrastive Pre-training with Adversarial Fine-Tuning for Zero-Shot Expert Linking
    Chen, Bo
    Zhang, Jing
    Zhang, Xiaokang
    Tang, Xiaobin
    Cai, Lingfan
    Chen, Hong
    Li, Cuiping
    Zhang, Peng
    Tang, Jie
    [J]. THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 11846 - 11854
  • [10] Contrastive Vision-Language Pre-training with Limited Resources
    Cui, Quan
    Zhou, Boyan
    Guo, Yu
    Yin, Weidong
    Wu, Hao
    Yoshie, Osamu
    Chen, Yubo
    [J]. COMPUTER VISION, ECCV 2022, PT XXXVI, 2022, 13696 : 236 - 253