VarGAN: Adversarial Learning of Variable Semantic Representations

被引：0

作者：

Lin, Yalan ^{[1
]}

Wan, Chengcheng ^{[2
]}

Bai, Shuwen ^{[3
]}

Gu, Xiaodong ^{[1
]}

机构：

[1] Shanghai Jiao Tong Univ, Sch Software, Shanghai 200240, Peoples R China

[2] East China Normal Univ, Software Engn Inst, Shanghai 200062, Peoples R China

[3] East China Univ Sci & Technol, Dept Comp Sci, Shanghai 200237, Peoples R China

来源：

IEEE TRANSACTIONS ON SOFTWARE ENGINEERING | 2024年 / 50卷 / 06期

基金：

中国国家自然科学基金;

关键词：

Codes; Vectors; Generators; Training; Semantics; Task analysis; Generative adversarial networks; Pre-trained language models; variable name representation; identifier representation; generative adversarial networks; CLONE DETECTION;

D O I：

10.1109/TSE.2024.3391730

中图分类号：

TP31 [计算机软件];

学科分类号：

081202 ; 0835 ;

摘要：

Variable names are of critical importance in code representation learning. However, due to diverse naming conventions, variables often receive arbitrary names, leading to long-tail, out-of-vocabulary (OOV), and other well-known problems. While the Byte-Pair Encoding (BPE) tokenizer has addressed the surface-level recognition of low-frequency tokens, it has not noticed the inadequate training of low-frequency identifiers by code representation models, resulting in an imbalanced distribution of rare and common identifiers. Consequently, code representation models struggle to effectively capture the semantics of low-frequency variable names. In this paper, we propose VarGAN, a novel method for variable name representations. VarGAN strengthens the training of low-frequency variables through adversarial training. Specifically, we regard the code representation model as a generator responsible for producing vectors from source code. Additionally, we employ a discriminator that detects whether the code input to the generator contains low-frequency variables. This adversarial setup regularizes the distribution of rare variables, making them overlap with their corresponding high-frequency counterparts in the vector space. Experimental results demonstrate that VarGAN empowers CodeBERT to generate code vectors that exhibit more uniform distribution for both low- and high-frequency identifiers. There is an improvement of 8% in similarity and relatedness scores compared to VarCLR in the IdBench benchmark. VarGAN is also validated in downstream tasks, where it exhibits enhanced capabilities in capturing token- and code-level semantics.

引用

页码：1505 / 1517

页数：13

共 50 条

[31] Face Completion with Semantic Knowledge and Collaborative Adversarial Learning
Liao, Haofu
Funka-Lea, Gareth
Zheng, Yefeng
Luo, Jiebo
Zhou, S. Kevin
COMPUTER VISION - ACCV 2018, PT I, 2019, 11361 : 382 - 397
[32] Scene Adaptation for Semantic Segmentation using Adversarial Learning
Di Mauro, D.
Furnari, A.
Patane, G.
Battiato, S.
Farinella, G. M.
2018 15TH IEEE INTERNATIONAL CONFERENCE ON ADVANCED VIDEO AND SIGNAL BASED SURVEILLANCE (AVSS), 2018, : 97 - 102
[33] Adversarial Learning of Semantic Relevance in Text to Image Synthesis
Cha, Miriam
Gwon, Youngjune L.
Kung, H. T.
THIRTY-THIRD AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FIRST INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / NINTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2019, : 3272 - 3279
[34] Manipulating Voice Attributes by Adversarial Learning of Structured Disentangled Representations
Benaroya, Laurent
Obin, Nicolas
Roebel, Axel
ENTROPY, 2023, 25 (02)
[35] An Adversarial Neuro-Tensorial Approach for Learning Disentangled Representations
Mengjiao Wang
Zhixin Shu
Shiyang Cheng
Yannis Panagakis
Dimitris Samaras
Stefanos Zafeiriou
International Journal of Computer Vision, 2019, 127 : 743 - 762
[36] An Adversarial Neuro-Tensorial Approach for Learning Disentangled Representations
Wang, Mengjiao
Shu, Zhixin
Cheng, Shiyang
Panagakis, Yannis
Samaras, Dimitris
Zafeiriou, Stefanos
INTERNATIONAL JOURNAL OF COMPUTER VISION, 2019, 127 (6-7) : 743 - 762
[37] Learning Invariant Representations From EEG via Adversarial Inference
Ozdenizci, Ozan
Wang, Ye
Koike-Akino, Toshiaki
Erdogmus, Deniz
IEEE ACCESS, 2020, 8 : 27074 - 27085
[38] Adversarial Training Helps Transfer Learning via Better Representations
Deng, Zhun
Zhang, Linjun
Vodrahalli, Kailas
Kawaguchi, Kenji
Zou, James
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
[39] Learning Informative and Private Representations via Generative Adversarial Networks
Yang, Tsung-Yen
Brinton, Christopher
Mittal, Prateek
Chiang, Mung
Lan, Andrew
2018 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2018, : 1534 - 1543
[40] Self-supervised graph representations with generative adversarial learning
Sun, Xuecheng
Wang, Zonghui
Lu, Zheming
Lu, Ziqian
NEUROCOMPUTING, 2024, 592

← 1 2 3 4 5 →