Double Consistency Regularization for Transformer Networks

被引:1
|
作者
Wan, Yuxian [1 ]
Zhang, Wenlin [1 ]
Li, Zhen [1 ]
机构
[1] PLA Strateg Support Force Informat Engn Univ, Sch Informat Syst Engn, Zhengzhou 450001, Peoples R China
关键词
cross-entropy loss; deep neural network; KL divergence; overfitting; transformer; regularization;
D O I
10.3390/electronics12204357
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The large-scale and deep-layer deep neural network based on the Transformer model is very powerful in sequence tasks, but it is prone to overfitting for small-scale training data. Moreover, the prediction result of the model with a small disturbance input is significantly lower than that without disturbance. In this work, we propose a double consistency regularization (DOCR) method for the end-to-end model structure, which separately constrains the output of the encoder and decoder during the training process to alleviate the above problems. Specifically, on the basis of the cross-entropy loss function, we build the mean model by integrating the model parameters of the previous rounds and measure the consistency between the models by calculating the KL divergence between the features of the encoder output and the probability distribution of the decoder output of the mean model and the base model so as to impose regularization constraints on the solution space of the model. We conducted extensive experiments on machine translation tasks, and the results show that the BLEU score increased by 2.60 on average, demonstrating the effectiveness of DOCR in improving model performance and its complementary impacts with other regularization techniques.
引用
收藏
页数:13
相关论文
共 50 条
  • [21] Enhance the Hidden Structure of Deep Neural Networks by Double Laplacian Regularization
    Fan, Yetian
    Yang, Wenyu
    Song, Bo
    Yan, Peilei
    Kang, Xiaoning
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II-EXPRESS BRIEFS, 2023, 70 (08) : 3114 - 3118
  • [22] Retraction Note: ICDN: integrating consistency and difference networks by transformer for multimodal sentiment analysis
    Qiongan Zhang
    Lei Shi
    Peiyu Liu
    Zhenfang Zhu
    Liancheng Xu
    Applied Intelligence, 2023, 53 : 19808 - 19808
  • [23] RETRACTED ARTICLE: ICDN: integrating consistency and difference networks by transformer for multimodal sentiment analysis
    Qiongan Zhang
    Lei Shi
    Peiyu Liu
    Zhenfang Zhu
    Liancheng Xu
    Applied Intelligence, 2023, 53 : 16332 - 16345
  • [24] Prediction consistency regularization for Generalized Category Discovery
    Duan, Yu
    He, Junzhi
    Zhang, Runxin
    Wang, Rong
    Li, Xuelong
    Nie, Feiping
    INFORMATION FUSION, 2024, 112
  • [25] Consistency Regularization for Variational Auto-Encoders
    Sinha, Samarth
    Dieng, Adji B.
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
  • [26] Sample Efficiency of Data Augmentation Consistency Regularization
    Yang, Shuo
    Dong, Yijun
    Ward, Rachel
    Dhillon, Inderjit S.
    Sanghavi, Sujay
    Lei, Qi
    INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 206, 2023, 206
  • [27] Augmentation, Mixing, and Consistency Regularization for Domain Generalization
    Mehmood, Noaman
    Barner, Kenneth
    2024 IEEE 3RD INTERNATIONAL CONFERENCE ON COMPUTING AND MACHINE INTELLIGENCE, ICMI 2024, 2024,
  • [28] Augmentation-induced Consistency Regularization for Classification
    Wu, Jianhan
    Si, Shijing
    Wang, Jianzong
    Xiao, Jing
    2022 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2022,
  • [29] Kernel Sliced Inverse Regression: Regularization and Consistency
    Wu, Qiang
    Liang, Feng
    Mukherjee, Sayan
    ABSTRACT AND APPLIED ANALYSIS, 2013,
  • [30] CONSISTENCY IN DIMENSIONAL REGULARIZATION WITH GAMMA-5
    BONNEAU, G
    PHYSICS LETTERS B, 1980, 96 (1-2) : 147 - 150