VarGAN: Adversarial Learning of Variable Semantic Representations

被引:0
|
作者
Lin, Yalan [1 ]
Wan, Chengcheng [2 ]
Bai, Shuwen [3 ]
Gu, Xiaodong [1 ]
机构
[1] Shanghai Jiao Tong Univ, Sch Software, Shanghai 200240, Peoples R China
[2] East China Normal Univ, Software Engn Inst, Shanghai 200062, Peoples R China
[3] East China Univ Sci & Technol, Dept Comp Sci, Shanghai 200237, Peoples R China
基金
中国国家自然科学基金;
关键词
Codes; Vectors; Generators; Training; Semantics; Task analysis; Generative adversarial networks; Pre-trained language models; variable name representation; identifier representation; generative adversarial networks; CLONE DETECTION;
D O I
10.1109/TSE.2024.3391730
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Variable names are of critical importance in code representation learning. However, due to diverse naming conventions, variables often receive arbitrary names, leading to long-tail, out-of-vocabulary (OOV), and other well-known problems. While the Byte-Pair Encoding (BPE) tokenizer has addressed the surface-level recognition of low-frequency tokens, it has not noticed the inadequate training of low-frequency identifiers by code representation models, resulting in an imbalanced distribution of rare and common identifiers. Consequently, code representation models struggle to effectively capture the semantics of low-frequency variable names. In this paper, we propose VarGAN, a novel method for variable name representations. VarGAN strengthens the training of low-frequency variables through adversarial training. Specifically, we regard the code representation model as a generator responsible for producing vectors from source code. Additionally, we employ a discriminator that detects whether the code input to the generator contains low-frequency variables. This adversarial setup regularizes the distribution of rare variables, making them overlap with their corresponding high-frequency counterparts in the vector space. Experimental results demonstrate that VarGAN empowers CodeBERT to generate code vectors that exhibit more uniform distribution for both low- and high-frequency identifiers. There is an improvement of 8% in similarity and relatedness scores compared to VarCLR in the IdBench benchmark. VarGAN is also validated in downstream tasks, where it exhibits enhanced capabilities in capturing token- and code-level semantics.
引用
收藏
页码:1505 / 1517
页数:13
相关论文
共 50 条
  • [31] Face Completion with Semantic Knowledge and Collaborative Adversarial Learning
    Liao, Haofu
    Funka-Lea, Gareth
    Zheng, Yefeng
    Luo, Jiebo
    Zhou, S. Kevin
    COMPUTER VISION - ACCV 2018, PT I, 2019, 11361 : 382 - 397
  • [32] Scene Adaptation for Semantic Segmentation using Adversarial Learning
    Di Mauro, D.
    Furnari, A.
    Patane, G.
    Battiato, S.
    Farinella, G. M.
    2018 15TH IEEE INTERNATIONAL CONFERENCE ON ADVANCED VIDEO AND SIGNAL BASED SURVEILLANCE (AVSS), 2018, : 97 - 102
  • [33] Adversarial Learning of Semantic Relevance in Text to Image Synthesis
    Cha, Miriam
    Gwon, Youngjune L.
    Kung, H. T.
    THIRTY-THIRD AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FIRST INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / NINTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2019, : 3272 - 3279
  • [34] Manipulating Voice Attributes by Adversarial Learning of Structured Disentangled Representations
    Benaroya, Laurent
    Obin, Nicolas
    Roebel, Axel
    ENTROPY, 2023, 25 (02)
  • [35] An Adversarial Neuro-Tensorial Approach for Learning Disentangled Representations
    Mengjiao Wang
    Zhixin Shu
    Shiyang Cheng
    Yannis Panagakis
    Dimitris Samaras
    Stefanos Zafeiriou
    International Journal of Computer Vision, 2019, 127 : 743 - 762
  • [36] An Adversarial Neuro-Tensorial Approach for Learning Disentangled Representations
    Wang, Mengjiao
    Shu, Zhixin
    Cheng, Shiyang
    Panagakis, Yannis
    Samaras, Dimitris
    Zafeiriou, Stefanos
    INTERNATIONAL JOURNAL OF COMPUTER VISION, 2019, 127 (6-7) : 743 - 762
  • [37] Learning Invariant Representations From EEG via Adversarial Inference
    Ozdenizci, Ozan
    Wang, Ye
    Koike-Akino, Toshiaki
    Erdogmus, Deniz
    IEEE ACCESS, 2020, 8 : 27074 - 27085
  • [38] Adversarial Training Helps Transfer Learning via Better Representations
    Deng, Zhun
    Zhang, Linjun
    Vodrahalli, Kailas
    Kawaguchi, Kenji
    Zou, James
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
  • [39] Learning Informative and Private Representations via Generative Adversarial Networks
    Yang, Tsung-Yen
    Brinton, Christopher
    Mittal, Prateek
    Chiang, Mung
    Lan, Andrew
    2018 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2018, : 1534 - 1543
  • [40] Self-supervised graph representations with generative adversarial learning
    Sun, Xuecheng
    Wang, Zonghui
    Lu, Zheming
    Lu, Ziqian
    NEUROCOMPUTING, 2024, 592