Double Consistency Regularization for Transformer Networks

被引:1
|
作者
Wan, Yuxian [1 ]
Zhang, Wenlin [1 ]
Li, Zhen [1 ]
机构
[1] PLA Strateg Support Force Informat Engn Univ, Sch Informat Syst Engn, Zhengzhou 450001, Peoples R China
关键词
cross-entropy loss; deep neural network; KL divergence; overfitting; transformer; regularization;
D O I
10.3390/electronics12204357
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The large-scale and deep-layer deep neural network based on the Transformer model is very powerful in sequence tasks, but it is prone to overfitting for small-scale training data. Moreover, the prediction result of the model with a small disturbance input is significantly lower than that without disturbance. In this work, we propose a double consistency regularization (DOCR) method for the end-to-end model structure, which separately constrains the output of the encoder and decoder during the training process to alleviate the above problems. Specifically, on the basis of the cross-entropy loss function, we build the mean model by integrating the model parameters of the previous rounds and measure the consistency between the models by calculating the KL divergence between the features of the encoder output and the probability distribution of the decoder output of the mean model and the base model so as to impose regularization constraints on the solution space of the model. We conducted extensive experiments on machine translation tasks, and the results show that the BLEU score increased by 2.60 on average, demonstrating the effectiveness of DOCR in improving model performance and its complementary impacts with other regularization techniques.
引用
收藏
页数:13
相关论文
共 50 条
  • [31] Global Structural Consistency Set Transformer
    Yang, Zengbiao
    Tan, Yihua
    PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2024, PT II, 2025, 15032 : 276 - 289
  • [32] Double-Uncertainty Guided Spatial and Temporal Consistency Regularization Weighting for Learning-Based Abdominal Registration
    Xu, Zhe
    Luo, Jie
    Lu, Donghuan
    Yan, Jiangpeng
    Frisken, Sarah
    Jagadeesan, Jayender
    Wells, William M., III
    Li, Xiu
    Zheng, Yefeng
    Tong, Raymond Kai-Yu
    MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION, MICCAI 2022, PT VI, 2022, 13436 : 14 - 24
  • [33] Scheduled DropHead: A Regularization Method for Transformer Models
    Zhou, Wangchunshu
    Ge, Tao
    Xu, Ke
    Wei, Furu
    Zhou, Ming
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2020, 2020, : 1971 - 1980
  • [34] Mitigating Transformer Overconfidence via Lipschitz Regularization
    Ye, Wenqian
    Ma, Yunsheng
    Cao, Xu
    Tang, Kun
    UNCERTAINTY IN ARTIFICIAL INTELLIGENCE, 2023, 216 : 2422 - 2432
  • [35] Extending Momentum Contrast With Cross Similarity Consistency Regularization
    Seyfi, Mehdi
    Banitalebi-Dehkordi, Amin
    Zhang, Yong
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2022, 32 (10) : 6714 - 6727
  • [36] Semisupervised RF Fingerprinting With Consistency-Based Regularization
    Wang, Weidong
    Luo, Cheng
    An, Jiancheng
    Gan, Lu
    Liao, Hongshu
    Yuen, Chau
    IEEE INTERNET OF THINGS JOURNAL, 2024, 11 (05): : 8624 - 8636
  • [37] Revisiting Consistency Regularization for Semi-Supervised Learning
    Fan, Yue
    Kukleva, Anna
    Dai, Dengxin
    Schiele, Bernt
    INTERNATIONAL JOURNAL OF COMPUTER VISION, 2023, 131 (03) : 626 - 643
  • [38] Revisiting Consistency Regularization for Deep Partial Label Learning
    Wu, Dong-Dong
    Wang, Deng-Bao
    Zhang, Min-Ling
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
  • [39] Distributed Semisupervised Fuzzy Regression With Interpolation Consistency Regularization
    Shi, Ye
    Zhang, Leijie
    Cao, Zehong
    Tanveer, Mohammad
    Lin, Chin-Teng
    IEEE TRANSACTIONS ON FUZZY SYSTEMS, 2022, 30 (08) : 3125 - 3137
  • [40] Consistency Regularization for Unsupervised Domain Adaptation in Semantic Segmentation
    Scherer, Sebastian
    Brehm, Stephan
    Lienhart, Rainer
    IMAGE ANALYSIS AND PROCESSING, ICIAP 2022, PT I, 2022, 13231 : 500 - 511