Joint Decoding for Speech Recognition and Semantic Tagging

被引:0
|
作者
Deoras, Anoop [1 ]
Sarikaya, Ruhi [1 ]
Tur, Gokhan [1 ]
Hakkani-Tuer, Dilek [1 ]
机构
[1] Microsoft Corp, Mountain View, CA 94041 USA
关键词
ME; CRF; SLU; CU; ASR;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Most conversational understanding (CU) systems today employ a cascade approach, where the best hypothesis from automatic speech recognizer (ASR) is fed into spoken language understanding (SLU) module, whose best hypothesis is then fed into other systems such as interpreter or dialog manager. In such approaches, errors from one statistical module irreversibly propagates into another module causing a serious degradation in the overall performance of the conversational understanding system. Thus it is desirable to jointly optimize all the statistical modules together. As a first step towards this, in this paper, we propose a joint decoding framework in which we predict the optimal word as well as slot (semantic tag) sequence jointly given the input acoustic stream. On Microsoft's CU system, we show 1.3% absolute reduction in word error rate (WER) and 1.2% absolute improvement in F measure for slot prediction when compared to a very strong cascade baseline comprising of the state-of-the-art recognizer followed by a slot sequence tagger.
引用
收藏
页码:1066 / 1069
页数:4
相关论文
共 50 条
  • [1] Joint decoding of multiple speech patterns for robust speech recognition
    Nair, Nishanth Ulhas
    Sreenivas, T. V.
    [J]. 2007 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING, VOLS 1 AND 2, 2007, : 93 - 98
  • [2] Joint Decoding of CTC Based Systems for Speech Recognition
    Guo, Jiaqi
    You, Yongbin
    Qian, Yanmin
    Yu, Kai
    [J]. INTERSPEECH 2019, 2019, : 2205 - 2209
  • [3] Joint decoding for phoneme-grapheme continuous speech recognition
    Doss, MM
    Bengio, S
    Bourlard, H
    [J]. 2004 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL I, PROCEEDINGS: SPEECH PROCESSING, 2004, : 177 - 180
  • [4] Joint CTC/attention decoding for end-to-end speech recognition
    Hori, Takaaki
    Watanabe, Shinji
    Hershey, John R.
    [J]. PROCEEDINGS OF THE 55TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2017), VOL 1, 2017, : 518 - 529
  • [5] Joint Uncertainty Decoding With Predictive Methods for Noise Robust Speech Recognition
    Xu, Haitian
    Gales, Mark J. F.
    Chin, K. K.
    [J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2011, 19 (06): : 1665 - 1676
  • [6] JOINT UNCERTAINTY DECODING WITH THE SECOND ORDER APPROXIMATION FOR NOISE ROBUST SPEECH RECOGNITION
    Xu, Haitian
    Chin, K. K.
    [J]. 2009 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1- 8, PROCEEDINGS, 2009, : 3841 - 3844
  • [7] Comparison of Estimation Techniques in Joint Uncertainty Decoding for Noise Robust Speech Recognition
    Xu, Haitian
    Chin, K. K.
    [J]. INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 2363 - 2366
  • [8] Joint Part-of-Speech Tagging and Named Entity Recognition Using Factor Graphs
    Mora, Gyoergy
    Vincze, Veronika
    [J]. TEXT, SPEECH AND DIALOGUE, TSD 2012, 2012, 7499 : 232 - 239
  • [9] Modeling speech recognition and synthesis simultaneously: Encoding and decoding lexical and sublexical semantic information into speech with no direct access to speech data
    Begus, Gasper
    Zhou, Alan
    [J]. INTERSPEECH 2022, 2022, : 5298 - 5302
  • [10] A unified framework for translation and understanding allowing discriminative joint decoding for multilingual speech semantic interpretation
    Jabaian, Bassam
    Lefevre, Fabrice
    Besacier, Laurent
    [J]. COMPUTER SPEECH AND LANGUAGE, 2016, 35 : 185 - 199