Beyond ASR 1-best:: Using word confusion networks in spoken language understanding

被引:84
|
作者
Hakkani-Tur, Dilek
Bechet, Frederic
Riccardi, Giuseppe
Tur, Gokhan
机构
[1] AT&T Labs Res, Florham Pk, NJ 07932 USA
[2] Univ Avignon, CNRS, LIA, F-84911 Avignon 09, France
[3] Univ Trent, I-38100 Trento, Italy
来源
COMPUTER SPEECH AND LANGUAGE | 2006年 / 20卷 / 04期
关键词
D O I
10.1016/j.csl.2005.07.005
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We are interested in the problem of robust understanding from noisy spontaneous speech input. With the advances in automated speech recognition (ASR), there has been increasing interest in spoken language understanding (SLU). A challenge in large vocabulary spoken language understanding is robustness to ASR errors. State of the art spoken language understanding relies on the best ASR hypotheses (ASR 1-best). In this,paper, we propose methods for a tighter integration of ASR and SLU using word confusion networks (WCNs). WCNs obtained from ASR word graphs (lattices) provide a compact representation of multiple aligned. ASR hypotheses along with word confidence scores, without compromising recognition accuracy. We present our work on exploiting WCNs instead of simply using ASR one-best hypotheses. In this work, we focus on the tasks of named entity detection and extraction and call classification in a spoken dialog system, although the idea is more general and applicable to other spoken language processing tasks. For named entity detection, we have improved the F-measure by using both word lattices and WCNs, 6-10% absolute. The processing of WCNs was 25 times faster than lattices, which is very important for real-life applications. For call classification, we have shown between 5% and 10% relative reduction in error rate using WCNs compared to ASR 1-best output. (c) 2005 Elsevier Ltd. All rights reserved.
引用
收藏
页码:495 / 514
页数:20
相关论文
共 50 条
  • [1] DISCRIMINATIVE SPOKEN LANGUAGE UNDERSTANDING USING WORD CONFUSION NETWORKS
    Henderson, Matthew
    Gasic, Milica
    Thomson, Blaise
    Tsiakoulis, Pirros
    Yu, Kai
    Young, Steve
    2012 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2012), 2012, : 176 - 181
  • [2] Using Word Confusion Networks for Slot Filling in Spoken Language Understanding
    Yang, Xiaohao
    Liu, Jia
    16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 1353 - 1357
  • [3] Conditional use of Word Lattices, Confusion Networks and 1-best string hypotheses in a Sequential Interpretation Strategy
    Minescu, Bogdan
    Damnati, Geraldine
    Bechet, Frederic
    De Mori, Renato
    INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4, 2007, : 1945 - +
  • [4] Spoken Document Clustering Using Word Confusion Networks
    Ikbal, Shajith
    Joshi, Sachindra
    Verma, Ashish
    Deshmukh, Om D.
    13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, : 1378 - 1381
  • [5] Jointly Encoding Word Confusion Network and Dialogue Context with BERT for Spoken Language Understanding
    Liu, Chen
    Zhu, Su
    Zhao, Zijian
    Cao, Ruisheng
    Chen, Lu
    Yu, Kai
    INTERSPEECH 2020, 2020, : 871 - 875
  • [6] ASR error management for improving spoken language understanding
    Simonnet, Edwin
    Ghannay, Sahar
    Camelin, Nathalie
    Esteve, Yannick
    De Mori, Renato
    18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 3329 - 3333
  • [7] ASR-Robust Spoken Language Understanding on ASR-GLUE dataset
    Feng, Lingyun
    Yu, Jianwei
    Cai, Deng
    Liu, Songxiang
    Zheng, Hai-Tao
    Wang, Yan
    INTERSPEECH 2022, 2022, : 1101 - 1105
  • [8] Contrastive Learning for Improving ASR Robustness in Spoken Language Understanding
    Chang, Ya-Hsin
    Chen, Yun-Nung
    INTERSPEECH 2022, 2022, : 3458 - 3462
  • [9] Towards an ASR error robust Spoken Language Understanding System
    Ruan, Weitong
    Nechaev, Yaroslav
    Chen, Luoxin
    Su, Chengwei
    Kiss, Imre
    INTERSPEECH 2020, 2020, : 901 - 905
  • [10] Is it time to switch to Word Embedding and Recurrent Neural Networks for Spoken Language Understanding?
    Vukotic, Vedran
    Raymond, Christian
    Gravier, Guillaume
    16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 130 - 134