On joint training with interfaces for spoken language understanding

被引：1

作者：

Raju, Anirudh ^{[1
]}

Rao, Milind ^{[1
]}

Tiwari, Gautam ^{[1
]}

Dheram, Pranav ^{[1
]}

Anderson, Bryan ^{[1
]}

Zhang, Zhe ^{[1
]}

Lee, Chul ^{[1
]}

Bui, Bach ^{[1
]}

Rastrow, Ariya ^{[1
]}

机构：

[1] Amazon Alexa AI, San Mateo, CA 94404 USA

来源：

INTERSPEECH 2022 | 2022年

关键词：

speech recognition; spoken language understanding; neural interfaces; multitask training; NETWORKS; ASR;

D O I：

10.21437/Interspeech.2022-11067

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Spoken language understanding (SLU) systems extract both text transcripts and semantics associated with intents and slots from input speech utterances. SLU systems usually consist of (1) an automatic speech recognition (ASR) module, (2) an interface module that exposes relevant outputs from ASR, and (3) a natural language understanding (NLU) module. Interfaces in SLU systems carry information on text transcriptions or richer information like neural embeddings from ASR to NLU. In this paper, we study how interfaces affect joint-training for spoken language understanding. Most notably, we obtain the state-of-the-art results on the publicly available 50-hr SLURP [1] dataset. We first leverage large-size pretrained ASR and NLU models that are connected by a text interface, and then jointly train both models via a sequence loss function. For scenarios where pretrained models are not utilized, the best results are obtained through a joint sequence loss training using richer neural interfaces. Finally, we show the overall diminishing impact of leveraging pretrained models with increased training data size.

引用

页码：1253 / 1257

页数：5

共 50 条

[1] SPLAT: Speech-Language Joint Pre-Training for Spoken Language Understanding
Chung, Yu-An
Zhu, Chenguang
Zeng, Michael
2021 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL-HLT 2021), 2021, : 1897 - 1907
[2] Adaptive Training for Robust Spoken Language Understanding
Garcia, Fernando
Sanchis, Emilio
Hurtado, Lluis-F.
Segarra, Encarna
PROGRESS IN PATTERN RECOGNITION, IMAGE ANALYSIS, COMPUTER VISION, AND APPLICATIONS, CIARP 2015, 2015, 9423 : 519 - 526
[3] Pre-training for Spoken Language Understanding with Joint Textual and Phonetic Representation Learning
Chen, Qian
Wang, Wen
Zhang, Qinglin
INTERSPEECH 2021, 2021, : 1244 - 1248
[4] Joint Spoken Language Understanding and Domain Adaptive Language Modeling
Zhang, Huifeng
Zhu, Su
Fan, Shuai
Yu, Kai
INTELLIGENCE SCIENCE AND BIG DATA ENGINEERING, 2018, 11266 : 311 - 324
[5] JOINT GENERATIVE AND DISCRIMINATIVE MODELS FOR SPOKEN LANGUAGE UNDERSTANDING
Dinarelli, Marco
Moschitti, Alessandro
Riccardi, Giuseppe
2008 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY: SLT 2008, PROCEEDINGS, 2008, : 61 - 64
[6] A Joint Learning Framework With BERT for Spoken Language Understanding
Zhang, Zhichang
Zhang, Zhenwen
Chen, Haoyuan
Zhang, Zhiman
IEEE ACCESS, 2019, 7 : 168849 - 168858
[7] Spoken language understanding
Wang, YY
Deng, L
Acero, A
IEEE SIGNAL PROCESSING MAGAZINE, 2005, 22 (05) : 16 - 31
[8] UNDERSTANDING SPOKEN LANGUAGE
BROWN, G
TESOL QUARTERLY, 1978, 12 (03) : 271 - 283
[9] ONENET: JOINT DOMAIN, INTENT, SLOT PREDICTION FOR SPOKEN LANGUAGE UNDERSTANDING
Kim, Young-Bum
Lee, Sungjin
Stratos, Karl
2017 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), 2017, : 547 - 553
[10] A JOINT MULTI-TASK LEARNING FRAMEWORK FOR SPOKEN LANGUAGE UNDERSTANDING
Li, Changliang
Kong, Cunliang
Zhao, Yan
2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 6054 - 6058

← 1 2 3 4 5 →