Beyond ASR 1-best:: Using word confusion networks in spoken language understanding

被引：84

作者：

Hakkani-Tur, Dilek

Bechet, Frederic

Riccardi, Giuseppe

Tur, Gokhan

机构：

[1] AT&T Labs Res, Florham Pk, NJ 07932 USA

[2] Univ Avignon, CNRS, LIA, F-84911 Avignon 09, France

[3] Univ Trent, I-38100 Trento, Italy

来源：

COMPUTER SPEECH AND LANGUAGE | 2006年 / 20卷 / 04期

关键词：

D O I：

10.1016/j.csl.2005.07.005

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We are interested in the problem of robust understanding from noisy spontaneous speech input. With the advances in automated speech recognition (ASR), there has been increasing interest in spoken language understanding (SLU). A challenge in large vocabulary spoken language understanding is robustness to ASR errors. State of the art spoken language understanding relies on the best ASR hypotheses (ASR 1-best). In this,paper, we propose methods for a tighter integration of ASR and SLU using word confusion networks (WCNs). WCNs obtained from ASR word graphs (lattices) provide a compact representation of multiple aligned. ASR hypotheses along with word confidence scores, without compromising recognition accuracy. We present our work on exploiting WCNs instead of simply using ASR one-best hypotheses. In this work, we focus on the tasks of named entity detection and extraction and call classification in a spoken dialog system, although the idea is more general and applicable to other spoken language processing tasks. For named entity detection, we have improved the F-measure by using both word lattices and WCNs, 6-10% absolute. The processing of WCNs was 25 times faster than lattices, which is very important for real-life applications. For call classification, we have shown between 5% and 10% relative reduction in error rate using WCNs compared to ASR 1-best output. (c) 2005 Elsevier Ltd. All rights reserved.

引用

页码：495 / 514

页数：20

共 50 条

[41] Automatic Speech Recognition of Code Switching Speech using 1-Best Rescoring
Ahmed, Basem H. A.
Tan, Tien-Ping
2012 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING (IALP 2012), 2012, : 137 - 140
[42] Open-vocabulary spoken utterance retrieval using confusion networks
Hori, Takaaki
Hetherington, I. Lee
Hazen, Timothy J.
Glass, James R.
2007 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL IV, PTS 1-3, 2007, : 73 - +
[43] A Spoken Language Understanding Approach Using Successive Learners
Wu, Wei-Lin
Lu, Ru-Zhan
Liu, Hui
Gao, Feng
INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 1906 - 1909
[44] A STUDY ON THE INTEGRATION OF PRE-TRAINED SSL, ASR, LM AND SLU MODELS FOR SPOKEN LANGUAGE UNDERSTANDING
Peng, Yifan
Arora, Siddhant
Higuchi, Yosuke
Ueda, Yushi
Kumar, Sujay
Ganesan, Karthik
Dalmia, Siddharth
Chang, Xuankai
Watanabe, Shinji
2022 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP, SLT, 2022, : 406 - 413
[45] Spoken language understanding using weakly supervised learning
Wu, Wei-Lin
Lu, Ru-Zhan
Duan, Jian-Yong
Liu, Hui
Gao, Feng
Chen, Yu-Quan
COMPUTER SPEECH AND LANGUAGE, 2010, 24 (02): : 358 - 382
[46] C2A-SLU: Cross and Contrastive Attention for Improving ASR Robustness in Spoken Language Understanding
Cheng, Xuxin
Yao, Ziyu
Zhu, Zhihong
Li, Yaowei
Li, Hongxiang
Zou, Yuexian
INTERSPEECH 2023, 2023, : 695 - 699
[47] MCLF: A Multi-grained Contrastive Learning Framework for ASR-robust Spoken Language Understanding
Huang, Zhiqi
Chen, Dongsheng
Zhu, Zhihong
Cheng, Xuxin
FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS - EMNLP 2023, 2023, : 7936 - 7949
[48] Sequential Convolutional Neural Networks for Slot Filling in Spoken Language Understanding
Ngoc Thang Vu
17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 3250 - 3254
[49] Mining Polysemous Triplets with Recurrent Neural Networks for Spoken Language Understanding
Vukotic, Vedran
Raymond, Christian
INTERSPEECH 2019, 2019, : 1178 - 1182
[50] Extending boosting for call classification using word confusion networks
Tur, G
Hakkani-Tür, D
Riccardi, G
2004 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL I, PROCEEDINGS: SPEECH PROCESSING, 2004, : 437 - 440

← 1 2 3 4 5 →