Attentive Multi-Layer Perceptron for Non-autoregressive Generation

被引:0
|
作者
Jiang, Shuyang [1 ]
Zhang, Jun [2 ]
Feng, Jiangtao [2 ]
Zheng, Lin [3 ]
Kong, Lingpeng [3 ]
机构
[1] Shanghai Jiao Tong Univ, Shanghai, Peoples R China
[2] Shanghai Artificial Intelligence Lab, Shanghai, Peoples R China
[3] Univ Hong Kong, Hong Kong, Peoples R China
关键词
AMLP; Multi-Layer Perceptron; Attention Mechanism; Non-Autoregressive Model; TRANSLATION;
D O I
10.1007/978-3-031-43415-0_36
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Autoregressive (AR) generation almost dominates sequence generation for its efficacy. Recently, non-autoregressive (NAR) generation gains increasing popularity for its efficiency and growing efficacy. However, its efficiency is still bottlenecked by quadratic complexity in sequence lengths, which is prohibitive for scaling to long sequence generation and few works have been done to mitigate this problem. In this paper, we propose a novel MLP variant, Attentive Multi-Layer Perceptron (AMLP), to produce a generation model with linear time and space complexity. Different from classic MLP with static and learnable projection matrices, AMLP leverages adaptive projections computed from inputs in an attentive mode. The sample-aware adaptive projections enable communications among tokens in a sequence, and model the measurement between the query and key space. Furthermore, we marry AMLP with popular NAR models, deriving a highly efficient NAR-AMLP architecture with linear time and space complexity. Empirical results show that such marriage architecture surpasses competitive efficient NAR models, by a significant margin on text-to-speech synthesis and machine translation. We also test AMLP's self- and cross-attention ability separately with extensive ablation experiments, and find them comparable or even superior to the other efficient models. The efficiency analysis further shows that AMLP extremely reduces the memory cost against vanilla non-autoregressive models for long sequences.
引用
收藏
页码:612 / 629
页数:18
相关论文
共 50 条
  • [21] Classification of EEG Signal from Imagined Writing using a Combined Autoregressive Model and Multi-Layer Perceptron.
    Zabidi, A.
    Mansor, W.
    Lee, Khuan Y.
    Fadzal, C. W. N. F. Che Wan
    2012 IEEE EMBS CONFERENCE ON BIOMEDICAL ENGINEERING AND SCIENCES (IECBES), 2012,
  • [22] Monotonic multi-layer perceptron networks as universal approximators
    Lang, B
    ARTIFICIAL NEURAL NETWORKS: FORMAL MODELS AND THEIR APPLICATIONS - ICANN 2005, PT 2, PROCEEDINGS, 2005, 3697 : 31 - 37
  • [23] Modifications of the Multi-Layer Perceptron for Hyperspectral Image Classification
    He, Xin
    Chen, Yushi
    REMOTE SENSING, 2021, 13 (17)
  • [24] Multiple optimal learning factors for the multi-layer perceptron
    Malalur, Sanjeev S.
    Manry, Michael T.
    Jesudhas, Praveen
    NEUROCOMPUTING, 2015, 149 : 1490 - 1501
  • [25] Multi-Layer Perceptron Model for Air Quality Prediction
    Abdullah, S.
    Ismail, M.
    Ahmed, A. N.
    MALAYSIAN JOURNAL OF MATHEMATICAL SCIENCES, 2019, 13 : 85 - 95
  • [26] Geno-mathematical identification of the multi-layer perceptron
    Ralf Östermark
    Neural Computing and Applications, 2009, 18 : 331 - 344
  • [27] Battle royale optimizer for training multi-layer perceptron
    Agahian, Saeid
    Akan, Taymaz
    EVOLVING SYSTEMS, 2022, 13 (04) : 563 - 575
  • [28] On the Learning of Non-Autoregressive Transformers
    Huang, Fei
    Tao, Tianhua
    Zhou, Hao
    Li, Lei
    Huang, Minlie
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
  • [29] Classification of Fake News Using Multi-Layer Perceptron
    Jehad, Reham
    Yousif, Suhad A.
    FOURTH INTERNATIONAL CONFERENCE OF MATHEMATICAL SCIENCES (ICMS 2020), 2021, 2334
  • [30] RECURSIVE ASSEMBLY OF MULTI-LAYER PERCEPTRON NEURAL NETWORKS
    Motato, Eliot
    Radcliffe, Clark
    7TH ANNUAL DYNAMIC SYSTEMS AND CONTROL CONFERENCE, 2014, VOL 2, 2014,