Contrastive Code-Comment Pre-training

被引：0

作者：

Pei, Xiaohuan ^{[1
]}

Liu, Daochang ^{[1
]}

Qian, Luo ^{[1
]}

Xu, Chang ^{[1
]}

机构：

[1] Univ Sydney, Sch Comp Sci, Fac Engn, Sydney, Australia

来源：

2022 IEEE INTERNATIONAL CONFERENCE ON DATA MINING (ICDM) | 2022年

基金：

澳大利亚研究理事会;

关键词：

contrastive learning; representation learning; pre-training; programming language processing;

D O I：

10.1109/ICDM54844.2022.00050

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Pre-trained models for Natural Languages (NL) have been recently shown to transfer well to Programming Languages (PL) and largely benefit different intelligence coderelated tasks, such as code search, clone detection, programming translation and code document generation. However, existing pre-trained methods for programming languages are mainly conducted by masked language modeling and next sentence prediction at token or graph levels. This restricted form limits their performance and transferability since PL and NL have different syntax rules and the downstream tasks require a multimodal representation. Here we introduce C3P, a Contrastive Code-Comment Pre-training approach, to solve various downstream tasks by pre-training the multi-representation features on both programming and natural syntax. The model encodes the code syntax and natural language description (comment) by two encoders and the encoded embeddings are projected into a multi-modal space for learning the latent representation. In the latent space, C3P jointly trains the code and comment encoders by the symmetric loss function, which aims to maximize the cosine similarity of the correct code-comment pairs while minimizing the similarity of unrelated pairs. We verify the empirical performance of the proposed pre-trained models on multiple downstream code-related tasks. The comprehensive experiments demonstrate that C3P outperforms previous work on the understanding tasks of code search and code clone, as well as the generation tasks of programming translation and document generation. Furthermore, we validate the transferability of C3P to the new programming language which is not seen in the pre-training stage. The results show our model surpasses all supervised methods and in some programming language cases even outperforms prior pre-trained approaches. Code is available at https://github.com/TerryPei/C3P.

引用

页码：398 / 407

页数：10

共 50 条

[1] Contrastive Pre-Training of GNNs on Heterogeneous Graphs
Jiang, Xunqiang
Lu, Yuanfu
Fang, Yuan
Shi, Chuan
[J]. PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON INFORMATION & KNOWLEDGE MANAGEMENT, CIKM 2021, 2021, : 803 - 812
[2] Adversarial momentum-contrastive pre-training
Xu, Cong
Li, Dan
Yang, Min
[J]. PATTERN RECOGNITION LETTERS, 2022, 160 : 172 - 179
[3] Robust Pre-Training by Adversarial Contrastive Learning
Jiang, Ziyu
Chen, Tianlong
Chen, Ting
Wang, Zhangyang
[J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33
[4] Temporal Contrastive Pre-Training for Sequential Recommendation
Tian, Changxin
Lin, Zihan
Bian, Shuqing
Wang, Jinpeng
Zhao, Wayne Xin
[J]. PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, CIKM 2022, 2022, : 1925 - 1934
[5] New Intent Discovery with Pre-training and Contrastive Learning
Zhang, Yuwei
Zhang, Haode
Zhan, Li-Ming
Wu, Xiao-Ming
Lam, Albert Y. S.
[J]. PROCEEDINGS OF THE 60TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), VOL 1: (LONG PAPERS), 2022, : 256 - 269
[6] Contrastive Language-knowledge Graph Pre-training
Yuan, Xiaowei
Liu, Kang
Wang, Yequan
[J]. ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2024, 23 (04)
[7] Multi-Modal Contrastive Pre-training for Recommendation
Liu, Zhuang
Ma, Yunpu
Schubert, Matthias
Ouyang, Yuanxin
Xiong, Zhang
[J]. PROCEEDINGS OF THE 2022 INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL, ICMR 2022, 2022, : 99 - 108
[8] Image Difference Captioning with Pre-training and Contrastive Learning
Yao, Linli
Wang, Weiying
Jin, Qin
[J]. THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / THE TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 3108 - 3116
[9] CODE: Contrastive Pre-training with Adversarial Fine-Tuning for Zero-Shot Expert Linking
Chen, Bo
Zhang, Jing
Zhang, Xiaokang
Tang, Xiaobin
Cai, Lingfan
Chen, Hong
Li, Cuiping
Zhang, Peng
Tang, Jie
[J]. THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 11846 - 11854
[10] Contrastive Vision-Language Pre-training with Limited Resources
Cui, Quan
Zhou, Boyan
Guo, Yu
Yin, Weidong
Wu, Hao
Yoshie, Osamu
Chen, Yubo
[J]. COMPUTER VISION, ECCV 2022, PT XXXVI, 2022, 13696 : 236 - 253

← 1 2 3 4 5 →