Double Consistency Regularization for Transformer Networks

被引:1
|
作者
Wan, Yuxian [1 ]
Zhang, Wenlin [1 ]
Li, Zhen [1 ]
机构
[1] PLA Strateg Support Force Informat Engn Univ, Sch Informat Syst Engn, Zhengzhou 450001, Peoples R China
关键词
cross-entropy loss; deep neural network; KL divergence; overfitting; transformer; regularization;
D O I
10.3390/electronics12204357
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The large-scale and deep-layer deep neural network based on the Transformer model is very powerful in sequence tasks, but it is prone to overfitting for small-scale training data. Moreover, the prediction result of the model with a small disturbance input is significantly lower than that without disturbance. In this work, we propose a double consistency regularization (DOCR) method for the end-to-end model structure, which separately constrains the output of the encoder and decoder during the training process to alleviate the above problems. Specifically, on the basis of the cross-entropy loss function, we build the mean model by integrating the model parameters of the previous rounds and measure the consistency between the models by calculating the KL divergence between the features of the encoder output and the probability distribution of the decoder output of the mean model and the base model so as to impose regularization constraints on the solution space of the model. We conducted extensive experiments on machine translation tasks, and the results show that the BLEU score increased by 2.60 on average, demonstrating the effectiveness of DOCR in improving model performance and its complementary impacts with other regularization techniques.
引用
收藏
页数:13
相关论文
共 50 条
  • [41] Towards Better Robust Generalization with Shift Consistency Regularization
    Zhang, Shufei
    Qian, Zhuang
    Huang, Kaizhu
    Wang, Qiufeng
    Zhang, Rui
    Yi, Xinping
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139
  • [42] CONSISTENCY AND CONVERGENCE RATE OF PHYLOGENETIC INFERENCE VIA REGULARIZATION
    Vu Dinh
    Lam Si Tung Ho
    Suchard, Marc A.
    Matsen, Frederick A.
    ANNALS OF STATISTICS, 2018, 46 (04): : 1481 - 1512
  • [43] Consistency of the regularization of gauge theories by high covariant derivatives
    Asorey, M
    Falceto, F
    PHYSICAL REVIEW D, 1996, 54 (08): : 5290 - 5301
  • [44] Towards Generalizable Morph Attack Detection with Consistency Regularization
    Kashiani, Hossein
    Talemi, Niloufar Alipour
    Saadabadi, Mohammad Saeed Ebrahimi
    Nasrabadi, Nasser M.
    2023 IEEE INTERNATIONAL JOINT CONFERENCE ON BIOMETRICS, IJCB, 2023,
  • [45] Brief Announcement: Program Regularization in Verifying Memory Consistency
    Li, Lei
    Chen, Tianshi
    Chen, Yunji
    Li, Ling
    Qian, Cheng
    Hu, Weiwu
    SPAA 11: PROCEEDINGS OF THE TWENTY-THIRD ANNUAL SYMPOSIUM ON PARALLELISM IN ALGORITHMS AND ARCHITECTURES, 2011, : 265 - 266
  • [46] Consistency Regularization for Domain Generalization with Logit Attribution Matching
    Gao, Han
    Li, Kaican
    Xie, Weiyan
    Lin, Zhi
    Huang, Yongxiang
    Wang, Luning
    Cao, Caleb Chen
    Zhang, Nevin L.
    UNCERTAINTY IN ARTIFICIAL INTELLIGENCE, 2024, 244 : 1389 - 1407
  • [47] Progressive Probabilistic Graph Matching with Local Consistency Regularization
    Tang, Min
    Wang, Wenmin
    COMPUTER ANALYSIS OF IMAGES AND PATTERNS: 17TH INTERNATIONAL CONFERENCE, CAIP 2017, PT II, 2017, 10425 : 105 - 115
  • [48] Consistency Regularization on Clean Samples for Learning with Noisy Labels
    Nomura, Yuichiro
    Kurita, Takio
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2022, E105D (02) : 387 - 395
  • [49] Consistency Regularization for Deep Face Anti-Spoofing
    Wang, Zezheng
    Yu, Zitong
    Wang, Xun
    Qin, Yunxiao
    Li, Jiahong
    Zhao, Chenxu
    Liu, Xin
    Lei, Zhen
    IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, 2023, 18 : 1127 - 1140
  • [50] FMixAugment for Semi-supervised Learning with Consistency Regularization
    Lin, Huibin
    Wang, Shiping
    Liu, Zhanghui
    Xiao, Shunxin
    Du, Shide
    Guo, Wenzhong
    PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2021, PT II, 2021, 13020 : 127 - 139