Double Consistency Regularization for Transformer Networks

被引:1
|
作者
Wan, Yuxian [1 ]
Zhang, Wenlin [1 ]
Li, Zhen [1 ]
机构
[1] PLA Strateg Support Force Informat Engn Univ, Sch Informat Syst Engn, Zhengzhou 450001, Peoples R China
关键词
cross-entropy loss; deep neural network; KL divergence; overfitting; transformer; regularization;
D O I
10.3390/electronics12204357
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The large-scale and deep-layer deep neural network based on the Transformer model is very powerful in sequence tasks, but it is prone to overfitting for small-scale training data. Moreover, the prediction result of the model with a small disturbance input is significantly lower than that without disturbance. In this work, we propose a double consistency regularization (DOCR) method for the end-to-end model structure, which separately constrains the output of the encoder and decoder during the training process to alleviate the above problems. Specifically, on the basis of the cross-entropy loss function, we build the mean model by integrating the model parameters of the previous rounds and measure the consistency between the models by calculating the KL divergence between the features of the encoder output and the probability distribution of the decoder output of the mean model and the base model so as to impose regularization constraints on the solution space of the model. We conducted extensive experiments on machine translation tasks, and the results show that the BLEU score increased by 2.60 on average, demonstrating the effectiveness of DOCR in improving model performance and its complementary impacts with other regularization techniques.
引用
收藏
页数:13
相关论文
共 50 条
  • [1] DropDim: A Regularization Method for Transformer Networks
    Zhang, Hao
    Qu, Dan
    Shao, Keji
    Yang, Xukui
    IEEE SIGNAL PROCESSING LETTERS, 2022, 29 : 474 - 478
  • [2] Randomness Regularization With Simple Consistency Training for Neural Networks
    Li, Juntao
    Liang, Xiaobo
    Wu, Lijun
    Wang, Yue
    Meng, Qi
    Qin, Tao
    Zhang, Min
    Liu, Tie-Yan
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2024, 46 (08) : 5763 - 5778
  • [3] Identification of NARX models using regularization networks: a consistency result
    De Nicolao, G
    Trecate, GF
    IEEE WORLD CONGRESS ON COMPUTATIONAL INTELLIGENCE, 1998, : 2407 - 2412
  • [4] RESIDUAL SWIN TRANSFORMER UNET WITH CONSISTENCY REGULARIZATION FOR AUTOMATIC BREAST ULTRASOUND TUMOR SEGMENTATION
    Zhuang, Xianwei
    Zhu, Xiner
    Hu, Haoji
    Yao, Jincao
    Li, Wei
    Yang, Chen
    Wang, Liping
    Feng, Na
    Xu, Dong
    2022 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2022, : 3071 - 3075
  • [5] Hyperspherical Consistency Regularization
    Tan, Cheng
    Gao, Zhangyang
    Wu, Lirong
    Li, Siyuan
    Li, Stan Z.
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, : 7234 - 7245
  • [6] ICDN: integrating consistency and difference networks by transformer for multimodal sentiment analysis
    Zhang, Qiongan
    Shi, Lei
    Liu, Peiyu
    Zhu, Zhenfang
    Xu, Liancheng
    APPLIED INTELLIGENCE, 2023, 53 (12) : 16332 - 16345
  • [7] Twin Fuzzy Networks With Interpolation Consistency Regularization for Weakly Supervised Anomaly Detection
    Cao, Zhi
    Shi, Ye
    Chang, Yu-Cheng
    Yao, Xin
    Lin, Chin-Teng
    IEEE TRANSACTIONS ON FUZZY SYSTEMS, 2024, 32 (09) : 5086 - 5097
  • [8] Transformer-based Open-world Instance Segmentation with Cross-task Consistency Regularization
    Xue, Xizhe
    Yu, Dongdong
    Liu, Lingqiao
    Liu, Yu
    Tsutsui, Satoshi
    Li, Ying
    Yuan, Zehuan
    Song, Ping
    Shou, Mike Zheng
    PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 2507 - 2515
  • [9] Consistency Regularization for Adversarial Robustness
    Tack, Jihoon
    Yu, Sihyun
    Jeong, Jongheon
    Kim, Minseon
    Hwang, Sung Ju
    Shin, Jinwoo
    THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 8414 - 8422
  • [10] Improved Consistency Regularization for GANs
    Zhao, Zhengli
    Singh, Sameer
    Lee, Honglak
    Zhang, Zizhao
    Odena, Augustus
    Zhang, Han
    THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 11033 - 11041