Beyond ASR 1-best:: Using word confusion networks in spoken language understanding

被引:84
|
作者
Hakkani-Tur, Dilek
Bechet, Frederic
Riccardi, Giuseppe
Tur, Gokhan
机构
[1] AT&T Labs Res, Florham Pk, NJ 07932 USA
[2] Univ Avignon, CNRS, LIA, F-84911 Avignon 09, France
[3] Univ Trent, I-38100 Trento, Italy
来源
COMPUTER SPEECH AND LANGUAGE | 2006年 / 20卷 / 04期
关键词
D O I
10.1016/j.csl.2005.07.005
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We are interested in the problem of robust understanding from noisy spontaneous speech input. With the advances in automated speech recognition (ASR), there has been increasing interest in spoken language understanding (SLU). A challenge in large vocabulary spoken language understanding is robustness to ASR errors. State of the art spoken language understanding relies on the best ASR hypotheses (ASR 1-best). In this,paper, we propose methods for a tighter integration of ASR and SLU using word confusion networks (WCNs). WCNs obtained from ASR word graphs (lattices) provide a compact representation of multiple aligned. ASR hypotheses along with word confidence scores, without compromising recognition accuracy. We present our work on exploiting WCNs instead of simply using ASR one-best hypotheses. In this work, we focus on the tasks of named entity detection and extraction and call classification in a spoken dialog system, although the idea is more general and applicable to other spoken language processing tasks. For named entity detection, we have improved the F-measure by using both word lattices and WCNs, 6-10% absolute. The processing of WCNs was 25 times faster than lattices, which is very important for real-life applications. For call classification, we have shown between 5% and 10% relative reduction in error rate using WCNs compared to ASR 1-best output. (c) 2005 Elsevier Ltd. All rights reserved.
引用
收藏
页码:495 / 514
页数:20
相关论文
共 50 条
  • [41] Automatic Speech Recognition of Code Switching Speech using 1-Best Rescoring
    Ahmed, Basem H. A.
    Tan, Tien-Ping
    2012 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING (IALP 2012), 2012, : 137 - 140
  • [42] Open-vocabulary spoken utterance retrieval using confusion networks
    Hori, Takaaki
    Hetherington, I. Lee
    Hazen, Timothy J.
    Glass, James R.
    2007 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL IV, PTS 1-3, 2007, : 73 - +
  • [43] A Spoken Language Understanding Approach Using Successive Learners
    Wu, Wei-Lin
    Lu, Ru-Zhan
    Liu, Hui
    Gao, Feng
    INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 1906 - 1909
  • [44] A STUDY ON THE INTEGRATION OF PRE-TRAINED SSL, ASR, LM AND SLU MODELS FOR SPOKEN LANGUAGE UNDERSTANDING
    Peng, Yifan
    Arora, Siddhant
    Higuchi, Yosuke
    Ueda, Yushi
    Kumar, Sujay
    Ganesan, Karthik
    Dalmia, Siddharth
    Chang, Xuankai
    Watanabe, Shinji
    2022 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP, SLT, 2022, : 406 - 413
  • [45] Spoken language understanding using weakly supervised learning
    Wu, Wei-Lin
    Lu, Ru-Zhan
    Duan, Jian-Yong
    Liu, Hui
    Gao, Feng
    Chen, Yu-Quan
    COMPUTER SPEECH AND LANGUAGE, 2010, 24 (02): : 358 - 382
  • [46] C2A-SLU: Cross and Contrastive Attention for Improving ASR Robustness in Spoken Language Understanding
    Cheng, Xuxin
    Yao, Ziyu
    Zhu, Zhihong
    Li, Yaowei
    Li, Hongxiang
    Zou, Yuexian
    INTERSPEECH 2023, 2023, : 695 - 699
  • [47] MCLF: A Multi-grained Contrastive Learning Framework for ASR-robust Spoken Language Understanding
    Huang, Zhiqi
    Chen, Dongsheng
    Zhu, Zhihong
    Cheng, Xuxin
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS - EMNLP 2023, 2023, : 7936 - 7949
  • [48] Sequential Convolutional Neural Networks for Slot Filling in Spoken Language Understanding
    Ngoc Thang Vu
    17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 3250 - 3254
  • [49] Mining Polysemous Triplets with Recurrent Neural Networks for Spoken Language Understanding
    Vukotic, Vedran
    Raymond, Christian
    INTERSPEECH 2019, 2019, : 1178 - 1182
  • [50] Extending boosting for call classification using word confusion networks
    Tur, G
    Hakkani-Tür, D
    Riccardi, G
    2004 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL I, PROCEEDINGS: SPEECH PROCESSING, 2004, : 437 - 440