Double Consistency Regularization for Transformer Networks

被引：1

作者：

Wan, Yuxian ^{[1
]}

Zhang, Wenlin ^{[1
]}

Li, Zhen ^{[1
]}

机构：

[1] PLA Strateg Support Force Informat Engn Univ, Sch Informat Syst Engn, Zhengzhou 450001, Peoples R China

来源：

ELECTRONICS | 2023年 / 12卷 / 20期

关键词：

cross-entropy loss; deep neural network; KL divergence; overfitting; transformer; regularization;

D O I：

10.3390/electronics12204357

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

The large-scale and deep-layer deep neural network based on the Transformer model is very powerful in sequence tasks, but it is prone to overfitting for small-scale training data. Moreover, the prediction result of the model with a small disturbance input is significantly lower than that without disturbance. In this work, we propose a double consistency regularization (DOCR) method for the end-to-end model structure, which separately constrains the output of the encoder and decoder during the training process to alleviate the above problems. Specifically, on the basis of the cross-entropy loss function, we build the mean model by integrating the model parameters of the previous rounds and measure the consistency between the models by calculating the KL divergence between the features of the encoder output and the probability distribution of the decoder output of the mean model and the base model so as to impose regularization constraints on the solution space of the model. We conducted extensive experiments on machine translation tasks, and the results show that the BLEU score increased by 2.60 on average, demonstrating the effectiveness of DOCR in improving model performance and its complementary impacts with other regularization techniques.

引用

页数：13

共 50 条

[1] DropDim: A Regularization Method for Transformer Networks
Zhang, Hao
Qu, Dan
Shao, Keji
Yang, Xukui
IEEE SIGNAL PROCESSING LETTERS, 2022, 29 : 474 - 478
[2] Randomness Regularization With Simple Consistency Training for Neural Networks
Li, Juntao
Liang, Xiaobo
Wu, Lijun
Wang, Yue
Meng, Qi
Qin, Tao
Zhang, Min
Liu, Tie-Yan
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2024, 46 (08) : 5763 - 5778
[3] Identification of NARX models using regularization networks: a consistency result
De Nicolao, G
Trecate, GF
IEEE WORLD CONGRESS ON COMPUTATIONAL INTELLIGENCE, 1998, : 2407 - 2412
[4] RESIDUAL SWIN TRANSFORMER UNET WITH CONSISTENCY REGULARIZATION FOR AUTOMATIC BREAST ULTRASOUND TUMOR SEGMENTATION
Zhuang, Xianwei
Zhu, Xiner
Hu, Haoji
Yao, Jincao
Li, Wei
Yang, Chen
Wang, Liping
Feng, Na
Xu, Dong
2022 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2022, : 3071 - 3075
[5] Hyperspherical Consistency Regularization
Tan, Cheng
Gao, Zhangyang
Wu, Lirong
Li, Siyuan
Li, Stan Z.
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, : 7234 - 7245
[6] ICDN: integrating consistency and difference networks by transformer for multimodal sentiment analysis
Zhang, Qiongan
Shi, Lei
Liu, Peiyu
Zhu, Zhenfang
Xu, Liancheng
APPLIED INTELLIGENCE, 2023, 53 (12) : 16332 - 16345
[7] Twin Fuzzy Networks With Interpolation Consistency Regularization for Weakly Supervised Anomaly Detection
Cao, Zhi
Shi, Ye
Chang, Yu-Cheng
Yao, Xin
Lin, Chin-Teng
IEEE TRANSACTIONS ON FUZZY SYSTEMS, 2024, 32 (09) : 5086 - 5097
[8] Transformer-based Open-world Instance Segmentation with Cross-task Consistency Regularization
Xue, Xizhe
Yu, Dongdong
Liu, Lingqiao
Liu, Yu
Tsutsui, Satoshi
Li, Ying
Yuan, Zehuan
Song, Ping
Shou, Mike Zheng
PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 2507 - 2515
[9] Consistency Regularization for Adversarial Robustness
Tack, Jihoon
Yu, Sihyun
Jeong, Jongheon
Kim, Minseon
Hwang, Sung Ju
Shin, Jinwoo
THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 8414 - 8422
[10] Improved Consistency Regularization for GANs
Zhao, Zhengli
Singh, Sameer
Lee, Honglak
Zhang, Zizhao
Odena, Augustus
Zhang, Han
THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 11033 - 11041

← 1 2 3 4 5 →