Correcting, Rescoring and Matching: An N-best List Selection Framework for Speech Recognition

被引:0
|
作者
Kuo, Chin-Hung [1 ]
Chen, Kuan-Yu [1 ]
机构
[1] Natl Taiwan Univ Sci & Technol, Taipei, Taiwan
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In recent years, automatic speech recognition (ASR) has been widely used in various scenarios, and it is usually the first step in many applications. Therefore, more and more studies concentrate on enhancing the recognition results. Among them, N-best reranking and error correction models are two active research subjects. Various models have been proposed and demonstrated their success. However, as the N-best reranking models aim to select the best hypothesis from a set of candidates, their performance upper bound is limited by the given set of hypotheses. The error correction models detect and correct recognition errors so as to provide better results, but they usually perform the process on the highest-scored hypothesis only. Therefore, the information embedded in other candidates is ignored. Besides, we note that almost all of the N-best reranking and error correction models consider the acoustic information implicitly, indirectly, or even omitted. In order to mitigate these flaws, we propose an N-best list selection framework, which consists of a text correction module, a text rescoring module, and a text-speech matching module, for speech recognition. Based on the proposed framework, a set of corrected hypotheses can be deduced, and then the text rescoring module is introduced to accurately rescore them. In addition, the text-speech matching module is employed to calculate the alignment score between each hypothesis and its own speech. The proposed framework is evaluated on the AISHELL-1 dataset, and the experimental results reveal that the proposed framework can deliver over 30% character error reduction rates compared to the baseline systems.
引用
收藏
页码:729 / 734
页数:6
相关论文
共 50 条
  • [41] Morpho-syntactic post-processing of N-best lists for improved French automatic speech recognition
    Huet, Stephane
    Gravier, Guillaume
    Sebillot, Pascale
    COMPUTER SPEECH AND LANGUAGE, 2010, 24 (04): : 663 - 684
  • [42] 3-D N-best search for simultaneous recognition of distant-talking speech of multiple talkers
    Nakamura, S
    Heracleous, P
    FOURTH IEEE INTERNATIONAL CONFERENCE ON MULTIMODAL INTERFACES, PROCEEDINGS, 2002, : 59 - 63
  • [43] IMPROVING NONNATIVE SPEECH UNDERSTANDING USING CONTEXT AND N-BEST MEANING FUSION
    Xu, Yushi
    Seneff, Stephanie
    2012 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2012, : 4977 - 4980
  • [44] DISCRIMINATIVE LEARNING USING LINGUISTIC FEATURES TO RESCORE N-BEST SPEECH HYPOTHESES
    Georgescul, Maria
    Rayner, Manny
    Bouillon, Pierrette
    Tsourakis, Nikos
    2008 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY: SLT 2008, PROCEEDINGS, 2008, : 97 - 100
  • [45] Improving WFST-based G2P Conversion with Alignment Constraints and RNNLM N-best Rescoring
    Novak, Josef R.
    Dixon, Paul R.
    Minematsu, Nobuaki
    Hirose, Keikichi
    Hori, Chiori
    Kashioka, Hideki
    13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, : 2525 - 2528
  • [46] Channel Selection Using N-Best Hypothesis for Multi-Microphone ASR
    Wolf, Martin
    Nadeu, Climent
    14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 3474 - 3478
  • [47] Simultaneous recognition of distant-talking speech of multiple talkers based on the 3-D N-best search method
    Heracleous, P
    Nakamura, S
    Shikano, K
    JOURNAL OF VLSI SIGNAL PROCESSING SYSTEMS FOR SIGNAL IMAGE AND VIDEO TECHNOLOGY, 2004, 36 (2-3): : 105 - 116
  • [48] Simultaneous Recognition of Distant-Talking Speech of Multiple Talkers Based on the 3-D N-Best Search Method
    Panikos Heracleous
    Satoshi Nakamura
    Kiyohiro Shikano
    Journal of VLSI signal processing systems for signal, image and video technology, 2004, 36 : 105 - 116
  • [49] Morphosyntactic Processing of N-Best Lists for Improved Recognition and Confidence Measure Computation
    Huet, Stephane
    Gravier, Guillaume
    Sebillot, Pascale
    INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4, 2007, : 1989 - 1992
  • [50] Character confidence based on N-best list for keyword spotting in online Chinese handwritten documents
    Zhang, Heng
    Wang, Da-Han
    Liu, Cheng-Lin
    PATTERN RECOGNITION, 2014, 47 (05) : 1880 - 1890