Correcting, Rescoring and Matching: An N-best List Selection Framework for Speech Recognition

被引：0

作者：

Kuo, Chin-Hung ^{[1
]}

Chen, Kuan-Yu ^{[1
]}

机构：

[1] Natl Taiwan Univ Sci & Technol, Taipei, Taiwan

来源：

PROCEEDINGS OF 2022 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC) | 2022年

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In recent years, automatic speech recognition (ASR) has been widely used in various scenarios, and it is usually the first step in many applications. Therefore, more and more studies concentrate on enhancing the recognition results. Among them, N-best reranking and error correction models are two active research subjects. Various models have been proposed and demonstrated their success. However, as the N-best reranking models aim to select the best hypothesis from a set of candidates, their performance upper bound is limited by the given set of hypotheses. The error correction models detect and correct recognition errors so as to provide better results, but they usually perform the process on the highest-scored hypothesis only. Therefore, the information embedded in other candidates is ignored. Besides, we note that almost all of the N-best reranking and error correction models consider the acoustic information implicitly, indirectly, or even omitted. In order to mitigate these flaws, we propose an N-best list selection framework, which consists of a text correction module, a text rescoring module, and a text-speech matching module, for speech recognition. Based on the proposed framework, a set of corrected hypotheses can be deduced, and then the text rescoring module is introduced to accurately rescore them. In addition, the text-speech matching module is employed to calculate the alignment score between each hypothesis and its own speech. The proposed framework is evaluated on the AISHELL-1 dataset, and the experimental results reveal that the proposed framework can deliver over 30% character error reduction rates compared to the baseline systems.

引用

页码：729 / 734

页数：6

共 50 条

[41] Morpho-syntactic post-processing of N-best lists for improved French automatic speech recognition
Huet, Stephane
Gravier, Guillaume
Sebillot, Pascale
COMPUTER SPEECH AND LANGUAGE, 2010, 24 (04): : 663 - 684
[42] 3-D N-best search for simultaneous recognition of distant-talking speech of multiple talkers
Nakamura, S
Heracleous, P
FOURTH IEEE INTERNATIONAL CONFERENCE ON MULTIMODAL INTERFACES, PROCEEDINGS, 2002, : 59 - 63
[43] IMPROVING NONNATIVE SPEECH UNDERSTANDING USING CONTEXT AND N-BEST MEANING FUSION
Xu, Yushi
Seneff, Stephanie
2012 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2012, : 4977 - 4980
[44] DISCRIMINATIVE LEARNING USING LINGUISTIC FEATURES TO RESCORE N-BEST SPEECH HYPOTHESES
Georgescul, Maria
Rayner, Manny
Bouillon, Pierrette
Tsourakis, Nikos
2008 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY: SLT 2008, PROCEEDINGS, 2008, : 97 - 100
[45] Improving WFST-based G2P Conversion with Alignment Constraints and RNNLM N-best Rescoring
Novak, Josef R.
Dixon, Paul R.
Minematsu, Nobuaki
Hirose, Keikichi
Hori, Chiori
Kashioka, Hideki
13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, : 2525 - 2528
[46] Channel Selection Using N-Best Hypothesis for Multi-Microphone ASR
Wolf, Martin
Nadeu, Climent
14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 3474 - 3478
[47] Simultaneous recognition of distant-talking speech of multiple talkers based on the 3-D N-best search method
Heracleous, P
Nakamura, S
Shikano, K
JOURNAL OF VLSI SIGNAL PROCESSING SYSTEMS FOR SIGNAL IMAGE AND VIDEO TECHNOLOGY, 2004, 36 (2-3): : 105 - 116
[48] Simultaneous Recognition of Distant-Talking Speech of Multiple Talkers Based on the 3-D N-Best Search Method
Panikos Heracleous
Satoshi Nakamura
Kiyohiro Shikano
Journal of VLSI signal processing systems for signal, image and video technology, 2004, 36 : 105 - 116
[49] Morphosyntactic Processing of N-Best Lists for Improved Recognition and Confidence Measure Computation
Huet, Stephane
Gravier, Guillaume
Sebillot, Pascale
INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4, 2007, : 1989 - 1992
[50] Character confidence based on N-best list for keyword spotting in online Chinese handwritten documents
Zhang, Heng
Wang, Da-Han
Liu, Cheng-Lin
PATTERN RECOGNITION, 2014, 47 (05) : 1880 - 1890

← 1 2 3 4 5 →