Attentive Multi-Layer Perceptron for Non-autoregressive Generation

被引：0

作者：

Jiang, Shuyang ^{[1
]}

Zhang, Jun ^{[2
]}

Feng, Jiangtao ^{[2
]}

Zheng, Lin ^{[3
]}

Kong, Lingpeng ^{[3
]}

机构：

[1] Shanghai Jiao Tong Univ, Shanghai, Peoples R China

[2] Shanghai Artificial Intelligence Lab, Shanghai, Peoples R China

[3] Univ Hong Kong, Hong Kong, Peoples R China

来源：

MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES: RESEARCH TRACK, ECML PKDD 2023, PT II | 2023年 / 14170卷

关键词：

AMLP; Multi-Layer Perceptron; Attention Mechanism; Non-Autoregressive Model; TRANSLATION;

D O I：

10.1007/978-3-031-43415-0_36

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Autoregressive (AR) generation almost dominates sequence generation for its efficacy. Recently, non-autoregressive (NAR) generation gains increasing popularity for its efficiency and growing efficacy. However, its efficiency is still bottlenecked by quadratic complexity in sequence lengths, which is prohibitive for scaling to long sequence generation and few works have been done to mitigate this problem. In this paper, we propose a novel MLP variant, Attentive Multi-Layer Perceptron (AMLP), to produce a generation model with linear time and space complexity. Different from classic MLP with static and learnable projection matrices, AMLP leverages adaptive projections computed from inputs in an attentive mode. The sample-aware adaptive projections enable communications among tokens in a sequence, and model the measurement between the query and key space. Furthermore, we marry AMLP with popular NAR models, deriving a highly efficient NAR-AMLP architecture with linear time and space complexity. Empirical results show that such marriage architecture surpasses competitive efficient NAR models, by a significant margin on text-to-speech synthesis and machine translation. We also test AMLP's self- and cross-attention ability separately with extensive ablation experiments, and find them comparable or even superior to the other efficient models. The efficiency analysis further shows that AMLP extremely reduces the memory cost against vanilla non-autoregressive models for long sequences.

引用

页码：612 / 629

页数：18

共 50 条

[1] Non-autoregressive personalized bundle generation
Yang, Wenchuan
Yang, Cheng
Li, Jichao
Tan, Yuejin
Lu, Xin
Shi, Chuan
INFORMATION PROCESSING & MANAGEMENT, 2024, 61 (05)
[2] A Study of Non-autoregressive Model for Sequence Generation
Ren, Yi
Liu, Jinglin
Tan, Xu
Zhao, Zhou
Zhao, Sheng
Liu, Tie-Yan
58TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2020), 2020, : 149 - 159
[3] Graph Attention Multi-Layer Perceptron
Zhang, Wentao
Yin, Ziqi
Sheng, Zeang
Li, Yang
Ouyang, Wen
Li, Xiaosen
Tao, Yangyu
Yang, Zhi
Cui, Bin
PROCEEDINGS OF THE 28TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, KDD 2022, 2022, : 4560 - 4570
[4] Symbolic representation of a multi-layer perceptron
Mouria-Beji, F
ARTIFICIAL NEURAL NETS AND GENETIC ALGORITHMS, 2001, : 205 - 208
[5] Local design for multi-layer perceptron
Xu, Li
Zidonghua Xuebao/Acta Automatica Sinica, 1997, 23 (03): : 325 - 331
[6] BANG: Bridging Autoregressive and Non-autoregressive Generation with Large Scale Pretraining
Qi, Weizhen
Gong, Yeyun
Jiao, Jian
Yan, Yu
Chen, Weizhu
Liu, Dayiheng
Tang, Kewen
Li, Houqiang
Chen, Jiusheng
Zhang, Ruofei
Zhou, Ming
Duan, Nan
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139
[7] Diffusion Models for Non-autoregressive Text Generation: A Survey
Li, Yifan
Zhou, Kun
Zhao, Wayne Xin
Wen, Ji-Rong
PROCEEDINGS OF THE THIRTY-SECOND INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2023, 2023, : 6692 - 6701
[8] Tighter Guarantees for the Compressive Multi-layer Perceptron
Kaban, Ata
Thummanusarn, Yamonporn
THEORY AND PRACTICE OF NATURAL COMPUTING (TPNC 2018), 2018, 11324 : 388 - 400
[9] Multi-Layer Perceptron with Pulse Glial Chain
Ikuta, Chihiro
Uwate, Yoko
Nishio, Yoshifumi
Yang, Guoan
IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES, 2016, E99A (03): : 742 - 755
[10] Multi-layer perceptron mapping on a SIMD architecture
Vitabile, S
Gentile, A
Dammone, GB
Sorbello, F
NEURAL NETWORKS FOR SIGNAL PROCESSING XII, PROCEEDINGS, 2002, : 667 - 675

← 1 2 3 4 5 →