Attentive Multi-Layer Perceptron for Non-autoregressive Generation

被引：0

作者：

Jiang, Shuyang ^{[1
]}

Zhang, Jun ^{[2
]}

Feng, Jiangtao ^{[2
]}

Zheng, Lin ^{[3
]}

Kong, Lingpeng ^{[3
]}

机构：

[1] Shanghai Jiao Tong Univ, Shanghai, Peoples R China

[2] Shanghai Artificial Intelligence Lab, Shanghai, Peoples R China

[3] Univ Hong Kong, Hong Kong, Peoples R China

来源：

MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES: RESEARCH TRACK, ECML PKDD 2023, PT II | 2023年 / 14170卷

关键词：

AMLP; Multi-Layer Perceptron; Attention Mechanism; Non-Autoregressive Model; TRANSLATION;

D O I：

10.1007/978-3-031-43415-0_36

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Autoregressive (AR) generation almost dominates sequence generation for its efficacy. Recently, non-autoregressive (NAR) generation gains increasing popularity for its efficiency and growing efficacy. However, its efficiency is still bottlenecked by quadratic complexity in sequence lengths, which is prohibitive for scaling to long sequence generation and few works have been done to mitigate this problem. In this paper, we propose a novel MLP variant, Attentive Multi-Layer Perceptron (AMLP), to produce a generation model with linear time and space complexity. Different from classic MLP with static and learnable projection matrices, AMLP leverages adaptive projections computed from inputs in an attentive mode. The sample-aware adaptive projections enable communications among tokens in a sequence, and model the measurement between the query and key space. Furthermore, we marry AMLP with popular NAR models, deriving a highly efficient NAR-AMLP architecture with linear time and space complexity. Empirical results show that such marriage architecture surpasses competitive efficient NAR models, by a significant margin on text-to-speech synthesis and machine translation. We also test AMLP's self- and cross-attention ability separately with extensive ablation experiments, and find them comparable or even superior to the other efficient models. The efficiency analysis further shows that AMLP extremely reduces the memory cost against vanilla non-autoregressive models for long sequences.

引用

页码：612 / 629

页数：18

共 50 条

[21] Classification of EEG Signal from Imagined Writing using a Combined Autoregressive Model and Multi-Layer Perceptron.
Zabidi, A.
Mansor, W.
Lee, Khuan Y.
Fadzal, C. W. N. F. Che Wan
2012 IEEE EMBS CONFERENCE ON BIOMEDICAL ENGINEERING AND SCIENCES (IECBES), 2012,
[22] Monotonic multi-layer perceptron networks as universal approximators
Lang, B
ARTIFICIAL NEURAL NETWORKS: FORMAL MODELS AND THEIR APPLICATIONS - ICANN 2005, PT 2, PROCEEDINGS, 2005, 3697 : 31 - 37
[23] Modifications of the Multi-Layer Perceptron for Hyperspectral Image Classification
He, Xin
Chen, Yushi
REMOTE SENSING, 2021, 13 (17)
[24] Multiple optimal learning factors for the multi-layer perceptron
Malalur, Sanjeev S.
Manry, Michael T.
Jesudhas, Praveen
NEUROCOMPUTING, 2015, 149 : 1490 - 1501
[25] Multi-Layer Perceptron Model for Air Quality Prediction
Abdullah, S.
Ismail, M.
Ahmed, A. N.
MALAYSIAN JOURNAL OF MATHEMATICAL SCIENCES, 2019, 13 : 85 - 95
[26] Geno-mathematical identification of the multi-layer perceptron
Ralf Östermark
Neural Computing and Applications, 2009, 18 : 331 - 344
[27] Battle royale optimizer for training multi-layer perceptron
Agahian, Saeid
Akan, Taymaz
EVOLVING SYSTEMS, 2022, 13 (04) : 563 - 575
[28] On the Learning of Non-Autoregressive Transformers
Huang, Fei
Tao, Tianhua
Zhou, Hao
Li, Lei
Huang, Minlie
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
[29] Classification of Fake News Using Multi-Layer Perceptron
Jehad, Reham
Yousif, Suhad A.
FOURTH INTERNATIONAL CONFERENCE OF MATHEMATICAL SCIENCES (ICMS 2020), 2021, 2334
[30] RECURSIVE ASSEMBLY OF MULTI-LAYER PERCEPTRON NEURAL NETWORKS
Motato, Eliot
Radcliffe, Clark
7TH ANNUAL DYNAMIC SYSTEMS AND CONTROL CONFERENCE, 2014, VOL 2, 2014,

← 1 2 3 4 5 →