Correcting, Rescoring and Matching: An N-best List Selection Framework for Speech Recognition

被引:0
|
作者
Kuo, Chin-Hung [1 ]
Chen, Kuan-Yu [1 ]
机构
[1] Natl Taiwan Univ Sci & Technol, Taipei, Taiwan
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In recent years, automatic speech recognition (ASR) has been widely used in various scenarios, and it is usually the first step in many applications. Therefore, more and more studies concentrate on enhancing the recognition results. Among them, N-best reranking and error correction models are two active research subjects. Various models have been proposed and demonstrated their success. However, as the N-best reranking models aim to select the best hypothesis from a set of candidates, their performance upper bound is limited by the given set of hypotheses. The error correction models detect and correct recognition errors so as to provide better results, but they usually perform the process on the highest-scored hypothesis only. Therefore, the information embedded in other candidates is ignored. Besides, we note that almost all of the N-best reranking and error correction models consider the acoustic information implicitly, indirectly, or even omitted. In order to mitigate these flaws, we propose an N-best list selection framework, which consists of a text correction module, a text rescoring module, and a text-speech matching module, for speech recognition. Based on the proposed framework, a set of corrected hypotheses can be deduced, and then the text rescoring module is introduced to accurately rescore them. In addition, the text-speech matching module is employed to calculate the alignment score between each hypothesis and its own speech. The proposed framework is evaluated on the AISHELL-1 dataset, and the experimental results reveal that the proposed framework can deliver over 30% character error reduction rates compared to the baseline systems.
引用
收藏
页码:729 / 734
页数:6
相关论文
共 50 条
  • [21] The ESAT 2008 System for N-Best Dutch Speech Recognition Benchmark
    Demuynck, Kris
    Puurula, Antti
    Van Compernolle, Dirk
    Wambacq, Patrick
    2009 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION & UNDERSTANDING (ASRU 2009), 2009, : 339 - 344
  • [22] A word graph based N-Best search in continuous speech recognition
    Tran, BH
    Seide, F
    Steinbiss, V
    ICSLP 96 - FOURTH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, VOLS 1-4, 1996, : 2127 - 2130
  • [23] JOINT N-BEST RESCORING FOR REPEATED UTTERANCES IN SPOKEN DIALOG SYSTEMS
    Bohus, Dan
    Zweig, Geoffrey
    Nguyen, Patrick
    Li, Xiao
    2008 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY: SLT 2008, PROCEEDINGS, 2008, : 133 - 136
  • [24] An N-Best Candidates-Based Discriminative Training for Speech Recognition Applications
    Chen, Jung-Kuei
    Soong, Frank K.
    IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 1994, 2 (01): : 206 - 216
  • [25] SEARCH RESULTS BASED N-BEST HYPOTHESIS RESCORING WITH MAXIMUM ENTROPY CLASSIFICATION
    Peng, Fuchun
    Roy, Scott
    Shahshahani, Ben
    Beaufays, Francoise
    2013 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING (ASRU), 2013, : 422 - 427
  • [26] SHoUT, the University of Twente Submission to the N-Best 2008 Speech Recognition Evaluation for Dutch
    Huijbregts, Marijn
    Ordelman, Roeland
    van der Werff, Laurens
    de Jong, Franciska
    INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 2547 - 2550
  • [27] N-best decision for Thai stressed speech recognition with parallel hidden Markov model
    Amornkul, P
    Kumhom, P
    Chamnongthai, K
    ISPACS 2005: PROCEEDINGS OF THE 2005 INTERNATIONAL SYMPOSIUM ON INTELLIGENT SIGNAL PROCESSING AND COMMUNICATION SYSTEMS, 2005, : 25 - 28
  • [28] N-best List Re-ranking Using Syntactic Score: A Solution for Improving Speech Recognition Accuracy in Air Traffic Control
    Van Nhan Nguyen
    Holone, Harald
    2016 16TH INTERNATIONAL CONFERENCE ON CONTROL, AUTOMATION AND SYSTEMS (ICCAS), 2016, : 1309 - 1314
  • [29] N-best Based Stochastic Mapping on Stereo HMM for Noise Robust Speech Recognition
    Cui, Xiaodong
    Afify, Mohamed
    Gao, Yuqing
    INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5, 2008, : 1261 - +
  • [30] Determination of the number of candidates using recognition scores for N-best based speech interface
    Cho, K
    Yamashita, Y
    Proceedings of the Sixth IASTED International Conference on Signal and Image Processing, 2004, : 268 - 272