Improving Autoregressive NMT with Non-Autoregressive Model

被引:0
|
作者
Zhou, Long [1 ,2 ]
Zhang, Jiajun [1 ,2 ]
Zong, Chengqing [1 ,2 ,3 ]
机构
[1] CASIA, Natl Lab Pattern Recognit, Beijing, Peoples R China
[2] Univ Chinese Acad Sci, Sch Artificial Intelligence, Beijing, Peoples R China
[3] CAS Ctr Excellence Brain Sci & Intelligence Techn, Shanghai, Peoples R China
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Autoregressive neural machine translation (NMT) models are often used to teach non-autoregressive models via knowledge distillation. However, there are few studies on improving the quality of autoregressive translation (AT) using non-autoregressive translation (NAT). In this work, we propose a novel Encoder-NAD-AD framework for NMT, aiming at boosting AT with global information produced by NAT model. Specifically, under the semantic guidance of source-side context captured by the encoder, the non-autoregressive decoder (NAD) first learns to generate target-side hidden state sequence in parallel. Then the autoregressive decoder (AD) performs translation from left to right, conditioned on source-side and target-side hidden states. Since AD has global information generated by low-latency NAD, it is more likely to produce a better translation with less time delay. Experiments on WMT14 En double right arrow De, WMT16 En double right arrow Ro, and IWSLT14 De double right arrow En translation tasks demonstrate that our framework achieves significant improvements with only 8% speed degeneration over the autoregressive NMT.
引用
收藏
页码:24 / 29
页数:6
相关论文
共 50 条
  • [1] Robust Cardinality Estimator by Non-autoregressive Model
    Ito, Ryuichi
    Xiao, Chuan
    Onizuka, Makoto
    [J]. SOFTWARE FOUNDATIONS FOR DATA INTEROPERABILITY, SFDI 2021, 2022, 1457 : 55 - 61
  • [2] A Study of Non-autoregressive Model for Sequence Generation
    Ren, Yi
    Liu, Jinglin
    Tan, Xu
    Zhao, Zhou
    Zhao, Sheng
    Liu, Tie-Yan
    [J]. 58TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2020), 2020, : 149 - 159
  • [3] On the Learning of Non-Autoregressive Transformers
    Huang, Fei
    Tao, Tianhua
    Zhou, Hao
    Li, Lei
    Huang, Minlie
    [J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
  • [4] Non-Autoregressive vs Autoregressive Neural Networks for System Identification
    Weber, Daniel
    Guehmann, Clemens
    [J]. IFAC PAPERSONLINE, 2021, 54 (20): : 692 - 698
  • [5] Improving Non-autoregressive Neural Machine Translation with Monolingual Data
    Zhou, Jiawei
    Keung, Phillip
    [J]. 58TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2020), 2020, : 1893 - 1898
  • [6] Hybrid Autoregressive and Non-Autoregressive Transformer Models for Speech Recognition
    Tian, Zhengkun
    Yi, Jiangyan
    Tao, Jianhua
    Zhang, Shuai
    Wen, Zhengqi
    [J]. IEEE SIGNAL PROCESSING LETTERS, 2022, 29 : 762 - 766
  • [7] An Effective Non-Autoregressive Model for Spoken Language Understanding
    Cheng, Lizhi
    Jia, Weijia
    Yang, Wenmian
    [J]. PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON INFORMATION & KNOWLEDGE MANAGEMENT, CIKM 2021, 2021, : 241 - 250
  • [8] Multitask Non-Autoregressive Model for Human Motion Prediction
    Li, Bin
    Tian, Jian
    Zhang, Zhongfei
    Feng, Hailin
    Li, Xi
    [J]. IEEE TRANSACTIONS ON IMAGE PROCESSING, 2021, 30 : 2562 - 2574
  • [9] Better Localness for Non-Autoregressive Transformer
    Wang, Shuheng
    Huang, Heyan
    Shi, Shumin
    [J]. ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2023, 22 (05)
  • [10] Bootstrap prediction intervals for autoregressive models fitted to non-autoregressive processes
    Matteo Grigoletto
    [J]. Journal of the Italian Statistical Society, 1998, 7 (3):