An Experimental Study on Structural-MAP Approaches to Implementing Very Large Vocabulary Speech Recognition Systems for Real-World Tasks

被引：0

作者：

Chen, I-Fan ^{[1
]}

Siniscalchi, Sabato Marco ^{[1
,2
]}

Moon, Seokyong ^{[3
]}

Shin, Daejin ^{[3
]}

Koo, Myong-Wan ^{[4
]}

Chung, Minhwa ^{[5
]}

Lee, Chin-Hui ^{[1
]}

机构：

[1] Georgia Inst Technol, Sch Elect & Comp Engn, Atlanta, GA 30332 USA

[2] Univ Enna, I-94100 Enna, Italy

[3] Infinity Telecom Co Ltd, Seoul, South Korea

[4] Sogang Univ, Dept Comp Sci & Engn, Seoul, South Korea

[5] Seoul Natl Univ, Dept Linguist, Seoul, South Korea

来源：

2013 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA) | 2013年

关键词：

HIDDEN MARKOV MODEL; DISCRIMINATIVE UTTERANCE VERIFICATION; MAXIMUM-LIKELIHOOD APPROACH; MIXTURE OBSERVATIONS; SPEAKER ADAPTATION; LANGUAGE MODEL; CLASSIFICATION; COMPENSATION; PARAMETERS;

D O I：

暂无

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

In this paper we present an experimental study exploiting structural Bayesian adaptation for handling potential mismatches between training and test conditions for real-world applications to be realized in our multilingual very large vocabulary speech recognition (VLVSR) system project sponsored by MOTIE (The Ministry of Trade, Industry and Energy), Republic of Korea. The goal of the project is to construct a national-wide VLVSR cloud service platform for mobile applications. Besides system architecture design issues, at such a large scale, performance robustness problems, caused by mismatches in speakers, tasks, environments, and domains, etc., need to be taken into account very carefully as well. We decide to adopt adaptation, especially the structural MAP, techniques to reduce system accuracy degradation caused by these mismatches. Being part of an ongoing project, we describe how structural MAP approaches can be used for adaptation of both acoustic and language models for our VLVSR systems, and provide convincing experimental results to demonstrate how adaptation can be utilized to bridge the performance gap between the current state-of-the-art and deployable VLVSR systems.

引用

页数：10

共 13 条

[1] MAP Based Speaker Adaptation in Very Large Vocabulary Speech Recognition of Czech
Cerva, Petr
Nouza, Jan
RADIOENGINEERING, 2004, 13 (03) : 42 - 46
[2] HANDS-FREE SPEECH RECOGNITION CHALLENGE FOR REAL-WORLD SPEECH DIALOGUE SYSTEMS
Saruwatari, Hiroshi
Kawanami, Hiromichi
Takeuchi, Shota
Takahashi, Yu
Cincarek, Tobias
Shikano, Kiyohiro
2009 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1- 8, PROCEEDINGS, 2009, : 3729 - 3732
[3] Efficient Embedded Speech Recognition for Very Large Vocabulary Mandarin Car-Navigation Systems
Qian, Yanmin
Liu, Jia
Johnson, Michael T.
IEEE TRANSACTIONS ON CONSUMER ELECTRONICS, 2009, 55 (03) : 1496 - 1500
[4] Are you sure? Analysing Uncertainty Quantification Approaches for Real-world Speech Emotion Recognition
Schruefer, Oliver
Milling, Manuel
Burkhardt, Felix
Eyben, Florian
Schuller, Bjoern
INTERSPEECH 2024, 2024, : 3210 - 3214
[5] A Study and Experimental Results for Sound Recognition in Real-world Robot Interaction
Lee, Sang-Rae
Yoon, Ho-Sub
Hahn, Moon-Sung
Chung, Myung-Ae
2012 9TH INTERNATIONAL CONFERENCE ON UBIQUITOUS ROBOTS AND AMBIENT INTELLIGENCE (URAL), 2012, : 26 - 29
[6] Small and Large Vocabulary Speech Recognition of MP3 Data under Real-Word Conditions: Experimental Study
Pollak, Petr
Borsky, Michal
E-BUSINESS AND TELECOMMUNICATIONS, 2012, 314 : 409 - +
[7] Introducing a process framework for implementing models of large-scale real-world systems in software
Andreou, Andreas S.
Software Process Improvement and Practice, 2004, 9 (03): : 133 - 155
[8] Study on Speaker-Independent Emotion Recognition from Speech on Real-World Data
Kostoulas, Theodoros
Ganchev, Todor
Fakotakis, Nikos
VERBAL AND NONVERBAL FEATURES OF HUMAN-HUMAN AND HUMAN-MACHINE INTERACTIONS, 2008, 5042 : 235 - 242
[9] Reverb and Noise as Real-World Effects in Speech Recognition Models: A Study and a Proposal of a Feature Set
Cesarini, Valerio
Costantini, Giovanni
APPLIED SCIENCES-BASEL, 2024, 14 (23):
[10] The cafeteria study: Effects of facial masks, hearing protection, and real-world noise on speech recognition
Barrett, Mary E.
Gordon-Salant, Sandra
Brungart, Douglas S.
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2021, 150 (06): : 4244 - 4255

← 1 2 →