Attentive Multi-Layer Perceptron for Non-autoregressive Generation

被引:0
|
作者
Jiang, Shuyang [1 ]
Zhang, Jun [2 ]
Feng, Jiangtao [2 ]
Zheng, Lin [3 ]
Kong, Lingpeng [3 ]
机构
[1] Shanghai Jiao Tong Univ, Shanghai, Peoples R China
[2] Shanghai Artificial Intelligence Lab, Shanghai, Peoples R China
[3] Univ Hong Kong, Hong Kong, Peoples R China
关键词
AMLP; Multi-Layer Perceptron; Attention Mechanism; Non-Autoregressive Model; TRANSLATION;
D O I
10.1007/978-3-031-43415-0_36
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Autoregressive (AR) generation almost dominates sequence generation for its efficacy. Recently, non-autoregressive (NAR) generation gains increasing popularity for its efficiency and growing efficacy. However, its efficiency is still bottlenecked by quadratic complexity in sequence lengths, which is prohibitive for scaling to long sequence generation and few works have been done to mitigate this problem. In this paper, we propose a novel MLP variant, Attentive Multi-Layer Perceptron (AMLP), to produce a generation model with linear time and space complexity. Different from classic MLP with static and learnable projection matrices, AMLP leverages adaptive projections computed from inputs in an attentive mode. The sample-aware adaptive projections enable communications among tokens in a sequence, and model the measurement between the query and key space. Furthermore, we marry AMLP with popular NAR models, deriving a highly efficient NAR-AMLP architecture with linear time and space complexity. Empirical results show that such marriage architecture surpasses competitive efficient NAR models, by a significant margin on text-to-speech synthesis and machine translation. We also test AMLP's self- and cross-attention ability separately with extensive ablation experiments, and find them comparable or even superior to the other efficient models. The efficiency analysis further shows that AMLP extremely reduces the memory cost against vanilla non-autoregressive models for long sequences.
引用
收藏
页码:612 / 629
页数:18
相关论文
共 50 条
  • [1] Non-autoregressive personalized bundle generation
    Yang, Wenchuan
    Yang, Cheng
    Li, Jichao
    Tan, Yuejin
    Lu, Xin
    Shi, Chuan
    INFORMATION PROCESSING & MANAGEMENT, 2024, 61 (05)
  • [2] A Study of Non-autoregressive Model for Sequence Generation
    Ren, Yi
    Liu, Jinglin
    Tan, Xu
    Zhao, Zhou
    Zhao, Sheng
    Liu, Tie-Yan
    58TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2020), 2020, : 149 - 159
  • [3] Graph Attention Multi-Layer Perceptron
    Zhang, Wentao
    Yin, Ziqi
    Sheng, Zeang
    Li, Yang
    Ouyang, Wen
    Li, Xiaosen
    Tao, Yangyu
    Yang, Zhi
    Cui, Bin
    PROCEEDINGS OF THE 28TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, KDD 2022, 2022, : 4560 - 4570
  • [4] Symbolic representation of a multi-layer perceptron
    Mouria-Beji, F
    ARTIFICIAL NEURAL NETS AND GENETIC ALGORITHMS, 2001, : 205 - 208
  • [5] Local design for multi-layer perceptron
    Xu, Li
    Zidonghua Xuebao/Acta Automatica Sinica, 1997, 23 (03): : 325 - 331
  • [6] BANG: Bridging Autoregressive and Non-autoregressive Generation with Large Scale Pretraining
    Qi, Weizhen
    Gong, Yeyun
    Jiao, Jian
    Yan, Yu
    Chen, Weizhu
    Liu, Dayiheng
    Tang, Kewen
    Li, Houqiang
    Chen, Jiusheng
    Zhang, Ruofei
    Zhou, Ming
    Duan, Nan
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139
  • [7] Diffusion Models for Non-autoregressive Text Generation: A Survey
    Li, Yifan
    Zhou, Kun
    Zhao, Wayne Xin
    Wen, Ji-Rong
    PROCEEDINGS OF THE THIRTY-SECOND INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2023, 2023, : 6692 - 6701
  • [8] Tighter Guarantees for the Compressive Multi-layer Perceptron
    Kaban, Ata
    Thummanusarn, Yamonporn
    THEORY AND PRACTICE OF NATURAL COMPUTING (TPNC 2018), 2018, 11324 : 388 - 400
  • [9] Multi-Layer Perceptron with Pulse Glial Chain
    Ikuta, Chihiro
    Uwate, Yoko
    Nishio, Yoshifumi
    Yang, Guoan
    IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES, 2016, E99A (03): : 742 - 755
  • [10] Multi-layer perceptron mapping on a SIMD architecture
    Vitabile, S
    Gentile, A
    Dammone, GB
    Sorbello, F
    NEURAL NETWORKS FOR SIGNAL PROCESSING XII, PROCEEDINGS, 2002, : 667 - 675