VarGAN: Adversarial Learning of Variable Semantic Representations

被引:0
|
作者
Lin, Yalan [1 ]
Wan, Chengcheng [2 ]
Bai, Shuwen [3 ]
Gu, Xiaodong [1 ]
机构
[1] Shanghai Jiao Tong Univ, Sch Software, Shanghai 200240, Peoples R China
[2] East China Normal Univ, Software Engn Inst, Shanghai 200062, Peoples R China
[3] East China Univ Sci & Technol, Dept Comp Sci, Shanghai 200237, Peoples R China
基金
中国国家自然科学基金;
关键词
Codes; Vectors; Generators; Training; Semantics; Task analysis; Generative adversarial networks; Pre-trained language models; variable name representation; identifier representation; generative adversarial networks; CLONE DETECTION;
D O I
10.1109/TSE.2024.3391730
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Variable names are of critical importance in code representation learning. However, due to diverse naming conventions, variables often receive arbitrary names, leading to long-tail, out-of-vocabulary (OOV), and other well-known problems. While the Byte-Pair Encoding (BPE) tokenizer has addressed the surface-level recognition of low-frequency tokens, it has not noticed the inadequate training of low-frequency identifiers by code representation models, resulting in an imbalanced distribution of rare and common identifiers. Consequently, code representation models struggle to effectively capture the semantics of low-frequency variable names. In this paper, we propose VarGAN, a novel method for variable name representations. VarGAN strengthens the training of low-frequency variables through adversarial training. Specifically, we regard the code representation model as a generator responsible for producing vectors from source code. Additionally, we employ a discriminator that detects whether the code input to the generator contains low-frequency variables. This adversarial setup regularizes the distribution of rare variables, making them overlap with their corresponding high-frequency counterparts in the vector space. Experimental results demonstrate that VarGAN empowers CodeBERT to generate code vectors that exhibit more uniform distribution for both low- and high-frequency identifiers. There is an improvement of 8% in similarity and relatedness scores compared to VarCLR in the IdBench benchmark. VarGAN is also validated in downstream tasks, where it exhibits enhanced capabilities in capturing token- and code-level semantics.
引用
收藏
页码:1505 / 1517
页数:13
相关论文
共 50 条
  • [21] Learning Semantic Representations to Verify Hardware Designs
    Vasudevan, Shobha
    Jiang, Wenjie
    Bieber, David
    Singh, Rishabh
    Shojaei, Hamid
    Ho, Richard
    Sutton, Charles
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
  • [22] Learning Continuous Semantic Representations of Symbolic Expressions
    Allamanis, Miltiadis
    Chanthirasegaran, Pankajan
    Kohli, Pushmeet
    Sutton, Charles
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 70, 2017, 70
  • [23] Learning Debiased and Disentangled Representations for Semantic Segmentation
    Chu, Sanghyeok
    Kim, Dongwan
    Han, Bohyung
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
  • [24] Learning Semantic Representations for Rating Vietnamese Comments
    Due-Hong Pham
    Anh-Cuong Le
    Thi-Kim-Chung Le
    2016 EIGHTH INTERNATIONAL CONFERENCE ON KNOWLEDGE AND SYSTEMS ENGINEERING (KSE), 2016, : 193 - 198
  • [25] Adversarial Learning for Implicit Semantic-Aware Communications
    Lu, Zhimin
    Xiao, Yong
    Sun, Zijian
    Li, Yingyu
    Shi, Guangming
    Chen, Xianfu
    Bennis, Mehdi
    Poor, H. Vincent
    ICC 2023-IEEE INTERNATIONAL CONFERENCE ON COMMUNICATIONS, 2023, : 4063 - 4069
  • [26] Robust Semantic Parsing with Adversarial Learning for Domain Generalization
    Marzinotto, Gabriel
    Damnati, Geraldine
    Bechet, Frederic
    Favre, Benoit
    2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES(NAACL HLT 2019), VOL. 2 (INDUSTRY PAPERS), 2019, : 166 - 173
  • [27] Joint Adversarial Learning for Domain Adaptation in Semantic Segmentation
    Zhang, Yixin
    Wang, Zilei
    THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 6877 - 6884
  • [28] Structural Semantic Adversarial Active Learning for Image Captioning
    Zhang, Beichen
    Li, Liang
    Su, Li
    Wang, Shuhui
    Deng, Jincan
    Zha, Zheng-Jun
    Huang, Qingming
    MM '20: PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, 2020, : 1112 - 1121
  • [29] Joint adversarial learning for domain adaptation in semantic segmentation
    Zhang, Yixin
    Wang, Zilei
    AAAI 2020 - 34th AAAI Conference on Artificial Intelligence, 2020, : 6877 - 6884
  • [30] Adversarial Zero-Shot Learning with Semantic Augmentation
    Tong, Bin
    Klinkigt, Martin
    Chen, Junwen
    Cui, Xiankun
    Kong, Quan
    Murakami, Tomokazu
    Kobayashi, Yoshiyuki
    THIRTY-SECOND AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTIETH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / EIGHTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2018, : 2476 - 2483