On joint training with interfaces for spoken language understanding

被引:1
|
作者
Raju, Anirudh [1 ]
Rao, Milind [1 ]
Tiwari, Gautam [1 ]
Dheram, Pranav [1 ]
Anderson, Bryan [1 ]
Zhang, Zhe [1 ]
Lee, Chul [1 ]
Bui, Bach [1 ]
Rastrow, Ariya [1 ]
机构
[1] Amazon Alexa AI, San Mateo, CA 94404 USA
来源
关键词
speech recognition; spoken language understanding; neural interfaces; multitask training; NETWORKS; ASR;
D O I
10.21437/Interspeech.2022-11067
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Spoken language understanding (SLU) systems extract both text transcripts and semantics associated with intents and slots from input speech utterances. SLU systems usually consist of (1) an automatic speech recognition (ASR) module, (2) an interface module that exposes relevant outputs from ASR, and (3) a natural language understanding (NLU) module. Interfaces in SLU systems carry information on text transcriptions or richer information like neural embeddings from ASR to NLU. In this paper, we study how interfaces affect joint-training for spoken language understanding. Most notably, we obtain the state-of-the-art results on the publicly available 50-hr SLURP [1] dataset. We first leverage large-size pretrained ASR and NLU models that are connected by a text interface, and then jointly train both models via a sequence loss function. For scenarios where pretrained models are not utilized, the best results are obtained through a joint sequence loss training using richer neural interfaces. Finally, we show the overall diminishing impact of leveraging pretrained models with increased training data size.
引用
收藏
页码:1253 / 1257
页数:5
相关论文
共 50 条
  • [1] SPLAT: Speech-Language Joint Pre-Training for Spoken Language Understanding
    Chung, Yu-An
    Zhu, Chenguang
    Zeng, Michael
    2021 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL-HLT 2021), 2021, : 1897 - 1907
  • [2] Adaptive Training for Robust Spoken Language Understanding
    Garcia, Fernando
    Sanchis, Emilio
    Hurtado, Lluis-F.
    Segarra, Encarna
    PROGRESS IN PATTERN RECOGNITION, IMAGE ANALYSIS, COMPUTER VISION, AND APPLICATIONS, CIARP 2015, 2015, 9423 : 519 - 526
  • [3] Pre-training for Spoken Language Understanding with Joint Textual and Phonetic Representation Learning
    Chen, Qian
    Wang, Wen
    Zhang, Qinglin
    INTERSPEECH 2021, 2021, : 1244 - 1248
  • [4] Joint Spoken Language Understanding and Domain Adaptive Language Modeling
    Zhang, Huifeng
    Zhu, Su
    Fan, Shuai
    Yu, Kai
    INTELLIGENCE SCIENCE AND BIG DATA ENGINEERING, 2018, 11266 : 311 - 324
  • [5] JOINT GENERATIVE AND DISCRIMINATIVE MODELS FOR SPOKEN LANGUAGE UNDERSTANDING
    Dinarelli, Marco
    Moschitti, Alessandro
    Riccardi, Giuseppe
    2008 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY: SLT 2008, PROCEEDINGS, 2008, : 61 - 64
  • [6] A Joint Learning Framework With BERT for Spoken Language Understanding
    Zhang, Zhichang
    Zhang, Zhenwen
    Chen, Haoyuan
    Zhang, Zhiman
    IEEE ACCESS, 2019, 7 : 168849 - 168858
  • [7] Spoken language understanding
    Wang, YY
    Deng, L
    Acero, A
    IEEE SIGNAL PROCESSING MAGAZINE, 2005, 22 (05) : 16 - 31
  • [8] UNDERSTANDING SPOKEN LANGUAGE
    BROWN, G
    TESOL QUARTERLY, 1978, 12 (03) : 271 - 283
  • [9] ONENET: JOINT DOMAIN, INTENT, SLOT PREDICTION FOR SPOKEN LANGUAGE UNDERSTANDING
    Kim, Young-Bum
    Lee, Sungjin
    Stratos, Karl
    2017 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), 2017, : 547 - 553
  • [10] A JOINT MULTI-TASK LEARNING FRAMEWORK FOR SPOKEN LANGUAGE UNDERSTANDING
    Li, Changliang
    Kong, Cunliang
    Zhao, Yan
    2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 6054 - 6058