Joint Discriminative Decoding of Words and Semantic Tags for Spoken Language Understanding

被引：11

作者：

Deoras, Anoop ^{[1
]}

Tur, Gokhan ^{[1
]}

Sarikaya, Ruhi ^{[1
]}

Hakkani-Tuer, Dilek ^{[1
]}

机构：

[1] Microsoft Corp, Mountain View, CA 94041 USA

来源：

IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING | 2013年 / 21卷 / 08期

关键词：

Joint Decoding; MaxEnt; CRF; SLU; ASR; lattice decoding; spoken language processing; speech and dialog understanding;

D O I：

10.1109/TASL.2013.2256894

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Most Spoken Language Understanding (SLU) systems today employ a cascade approach, where the best hypothesis from Automatic Speech Recognizer (ASR) is fed into understanding modules such as slot sequence classifiers and intent detectors. The output of these modules is then further fed into downstream components such as interpreter and/or knowledge broker. These statistical models are usually trained individually to optimize the error rate of their respective output. In such approaches, errors from one module irreversibly propagates into other modules causing a serious degradation in the overall performance of the SLU system. Thus it is desirable to jointly optimize all the statistical models together. As a first step towards this, in this paper, we propose a joint decoding framework in which we predict the optimal word as well as slot sequence (semantic tag sequence) jointly given the input acoustic stream. Furthermore, the improved recognition output is then used for an utterance classification task, specifically, we focus on intent detection task. On a SLU task, we show 1.5% absolute reduction (7.6% relative reduction) in word error rate (WER) and 1.2% absolute improvement in F measure for slot prediction when compared to a very strong cascade baseline comprising of state-of-the-art large vocabulary ASR followed by conditional random field (CRF) based slot sequence tagger. Similarly, for intent detection, we show 1.2% absolute reduction (12% relative reduction) in classification error rate.

引用

下载

页码：1612 / 1621

页数：10

共 50 条

[31] SPOKEN LANGUAGE UNDERSTANDING FROM UNALIGNED DATA USING DISCRIMINATIVE CLASSIFICATION MODELS
Mairesse, F.
Gasic, M.
Jurcicek, F.
Keizer, S.
Thomson, B.
Yu, K.
Young, S.
2009 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1- 8, PROCEEDINGS, 2009, : 4749 - 4752
[32] Discriminative vector for spoken language recognition
Ma, Bin
Tong, Rong
Li, Haizhou
2007 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL IV, PTS 1-3, 2007, : 1001 - +
[33] Learning Semantic Hierarchy with Distributed Representations for Unsupervised Spoken Language Understanding
Chen, Yun-Nung
Wang, William Yang
Rudnicky, Alexander I.
16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 1869 - 1873
[34] Exploring the Correlation of Pitch Accents and Semantic Slots for Spoken Language Understanding
Stehwien, Sabrina
Ngoc Thang Vu
17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 730 - 734
[35] Deep Belief Network based Semantic Taggers for Spoken Language Understanding
Deoras, Anoop
Sarikaya, Ruhi
14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 2712 - 2716
[36] UNDERSTANDING SPOKEN LANGUAGE
BROWN, G
TESOL QUARTERLY, 1978, 12 (03) : 271 - 283
[37] Spoken language understanding
Wang, YY
Deng, L
Acero, A
IEEE SIGNAL PROCESSING MAGAZINE, 2005, 22 (05) : 16 - 31
[38] Brain-Based Translation: fMRI Decoding of Spoken Words in Bilinguals Reveals Language-Independent Semantic Representations in Anterior Temporal Lobe
Correia, Joao
Formisano, Elia
Valente, Giancarlo
Hausfeld, Lars
Jansma, Bernadette
Bonte, Milene
JOURNAL OF NEUROSCIENCE, 2014, 34 (01): : 332 - 338
[39] SEMANTIC CONSTRAINT ON DECODING OF AMBIGUOUS WORDS
PERFETTI, CA
GOODMAN, D
JOURNAL OF EXPERIMENTAL PSYCHOLOGY, 1970, 86 (03): : 420 - &
[40] SPLAT: Speech-Language Joint Pre-Training for Spoken Language Understanding
Chung, Yu-An
Zhu, Chenguang
Zeng, Michael
2021 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL-HLT 2021), 2021, : 1897 - 1907

← 1 2 3 4 5 →