Double Consistency Regularization for Transformer Networks

被引：1

作者：

Wan, Yuxian ^{[1
]}

Zhang, Wenlin ^{[1
]}

Li, Zhen ^{[1
]}

机构：

[1] PLA Strateg Support Force Informat Engn Univ, Sch Informat Syst Engn, Zhengzhou 450001, Peoples R China

来源：

ELECTRONICS | 2023年 / 12卷 / 20期

关键词：

cross-entropy loss; deep neural network; KL divergence; overfitting; transformer; regularization;

D O I：

10.3390/electronics12204357

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

The large-scale and deep-layer deep neural network based on the Transformer model is very powerful in sequence tasks, but it is prone to overfitting for small-scale training data. Moreover, the prediction result of the model with a small disturbance input is significantly lower than that without disturbance. In this work, we propose a double consistency regularization (DOCR) method for the end-to-end model structure, which separately constrains the output of the encoder and decoder during the training process to alleviate the above problems. Specifically, on the basis of the cross-entropy loss function, we build the mean model by integrating the model parameters of the previous rounds and measure the consistency between the models by calculating the KL divergence between the features of the encoder output and the probability distribution of the decoder output of the mean model and the base model so as to impose regularization constraints on the solution space of the model. We conducted extensive experiments on machine translation tasks, and the results show that the BLEU score increased by 2.60 on average, demonstrating the effectiveness of DOCR in improving model performance and its complementary impacts with other regularization techniques.

引用

页数：13

共 50 条

[31] Global Structural Consistency Set Transformer
Yang, Zengbiao
Tan, Yihua
PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2024, PT II, 2025, 15032 : 276 - 289
[32] Double-Uncertainty Guided Spatial and Temporal Consistency Regularization Weighting for Learning-Based Abdominal Registration
Xu, Zhe
Luo, Jie
Lu, Donghuan
Yan, Jiangpeng
Frisken, Sarah
Jagadeesan, Jayender
Wells, William M., III
Li, Xiu
Zheng, Yefeng
Tong, Raymond Kai-Yu
MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION, MICCAI 2022, PT VI, 2022, 13436 : 14 - 24
[33] Scheduled DropHead: A Regularization Method for Transformer Models
Zhou, Wangchunshu
Ge, Tao
Xu, Ke
Wei, Furu
Zhou, Ming
FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2020, 2020, : 1971 - 1980
[34] Mitigating Transformer Overconfidence via Lipschitz Regularization
Ye, Wenqian
Ma, Yunsheng
Cao, Xu
Tang, Kun
UNCERTAINTY IN ARTIFICIAL INTELLIGENCE, 2023, 216 : 2422 - 2432
[35] Extending Momentum Contrast With Cross Similarity Consistency Regularization
Seyfi, Mehdi
Banitalebi-Dehkordi, Amin
Zhang, Yong
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2022, 32 (10) : 6714 - 6727
[36] Semisupervised RF Fingerprinting With Consistency-Based Regularization
Wang, Weidong
Luo, Cheng
An, Jiancheng
Gan, Lu
Liao, Hongshu
Yuen, Chau
IEEE INTERNET OF THINGS JOURNAL, 2024, 11 (05): : 8624 - 8636
[37] Revisiting Consistency Regularization for Semi-Supervised Learning
Fan, Yue
Kukleva, Anna
Dai, Dengxin
Schiele, Bernt
INTERNATIONAL JOURNAL OF COMPUTER VISION, 2023, 131 (03) : 626 - 643
[38] Revisiting Consistency Regularization for Deep Partial Label Learning
Wu, Dong-Dong
Wang, Deng-Bao
Zhang, Min-Ling
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
[39] Distributed Semisupervised Fuzzy Regression With Interpolation Consistency Regularization
Shi, Ye
Zhang, Leijie
Cao, Zehong
Tanveer, Mohammad
Lin, Chin-Teng
IEEE TRANSACTIONS ON FUZZY SYSTEMS, 2022, 30 (08) : 3125 - 3137
[40] Consistency Regularization for Unsupervised Domain Adaptation in Semantic Segmentation
Scherer, Sebastian
Brehm, Stephan
Lienhart, Rainer
IMAGE ANALYSIS AND PROCESSING, ICIAP 2022, PT I, 2022, 13231 : 500 - 511

← 1 2 3 4 5 →