END-TO-END SPOKEN LANGUAGE UNDERSTANDING USING TRANSFORMER NETWORKS AND SELF-SUPERVISED PRE-TRAINED FEATURES

被引：1

作者：

Morais, Edmilson ^{[1
]}

Kuo, Hong-Kwang J. ^{[1
]}

Thomas, Samuel ^{[1
]}

Tuske, Zoltan ^{[1
]}

Kingsbury, Brian ^{[1
]}

机构：

[1] IBM Res AI, Yorktown Hts, NY 10598 USA

来源：

2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021) | 2021年

关键词：

Spoken language understanding; transformer networks; self-supervised pre-training; end-to-end systems;

D O I：

10.1109/ICASSP39728.2021.9414522

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Transformer networks and self-supervised pre-training have consistently delivered state-of-art results in the field of natural language processing (NLP); however, their merits in the field of spoken language understanding (SLU) still need further investigation. In this paper we introduce a modular End-to- End (E2E) SLU transformer network based architecture which allows the use of self-supervised pretrained acoustic features, pre-trained model initialization and multi-task training. Several SLU experiments for predicting intent and entity labels/values using the ATIS dataset are performed. These experiments investigate the interaction of pre-trained model initialization and multi-task training with either traditional filterbank or self-supervised pre-trained acoustic features. Results show not only that self-supervised pre-trained acoustic features outperform filterbank features in almost all the experiments, but also that when these features are used in combination with multi-task training, they almost eliminate the necessity of pre-trained model initialization.

引用

页码：7483 / 7487

页数：5

共 50 条

[1] INTEGRATION OF PRE-TRAINED NETWORKS WITH CONTINUOUS TOKEN INTERFACE FOR END-TO-END SPOKEN LANGUAGE UNDERSTANDING
Seo, Seunghyun
Kwak, Donghyun
Lee, Bowon
[J]. 2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 7152 - 7156
[2] Self-Supervised Pre-Trained Speech Representation Based End-to-End Mispronunciation Detection and Diagnosis of Mandarin
Shen, Yunfei
Liu, Qingqing
Fan, Zhixing
Liu, Jiajun
Wumaier, Aishan
[J]. IEEE ACCESS, 2022, 10 : 106451 - 106462
[3] Pre-trained multimodal end-to-end network for spoken language assessment incorporating prompts
Lin, Binghuai
Wang, Liyuan
[J]. PROCEEDINGS OF 2022 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2022, : 1394 - 1398
[4] An End-to-End Contrastive Self-Supervised Learning Framework for Language Understanding
Fang, Hongchao
Xie, Pengtao
[J]. TRANSACTIONS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, 2022, 10 : 1324 - 1340
[5] End-to-End Neural Transformer Based Spoken Language Understanding
Radfar, Martin
Mouchtaris, Athanasios
Kunzmann, Siegfried
[J]. INTERSPEECH 2020, 2020, : 866 - 870
[6] Unsupervised Visual Anomaly Detection Using Self-Supervised Pre-Trained Transformer
Kim, Jun-Hyung
Kwon, Goo-Rak
[J]. IEEE ACCESS, 2024, 12 : 127604 - 127613
[7] Three-Module Modeling For End-to-End Spoken Language Understanding Using Pre-trained DNN-HMM-Based Acoustic-Phonetic Model
Wang, Nick J. C.
Wang, Lu
Sun, Yandan
Kang, Haimei
Zhang, Dejun
[J]. INTERSPEECH 2021, 2021, : 4718 - 4722
[8] TOWARDS END-TO-END SPOKEN LANGUAGE UNDERSTANDING
Serdyuk, Dmitriy
Wang, Yongqiang
Fuegen, Christian
Kumar, Anuj
Liu, Baiyang
Bengio, Yoshua
[J]. 2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5754 - 5758
[9] Low resource end-to-end spoken language understanding with capsule networks
Poncelet, Jakob
Renkens, Vincent
Van hamme, Hugo
[J]. COMPUTER SPEECH AND LANGUAGE, 2021, 66
[10] Adapting Transformer to End-to-end Spoken Language Translation
Di Gangi, Mattia A.
Negri, Matteo
Turchi, Marco
[J]. INTERSPEECH 2019, 2019, : 1133 - 1137

← 1 2 3 4 5 →