An Experimental Study on Structural-MAP Approaches to Implementing Very Large Vocabulary Speech Recognition Systems for Real-World Tasks

被引:0
|
作者
Chen, I-Fan [1 ]
Siniscalchi, Sabato Marco [1 ,2 ]
Moon, Seokyong [3 ]
Shin, Daejin [3 ]
Koo, Myong-Wan [4 ]
Chung, Minhwa [5 ]
Lee, Chin-Hui [1 ]
机构
[1] Georgia Inst Technol, Sch Elect & Comp Engn, Atlanta, GA 30332 USA
[2] Univ Enna, I-94100 Enna, Italy
[3] Infinity Telecom Co Ltd, Seoul, South Korea
[4] Sogang Univ, Dept Comp Sci & Engn, Seoul, South Korea
[5] Seoul Natl Univ, Dept Linguist, Seoul, South Korea
关键词
HIDDEN MARKOV MODEL; DISCRIMINATIVE UTTERANCE VERIFICATION; MAXIMUM-LIKELIHOOD APPROACH; MIXTURE OBSERVATIONS; SPEAKER ADAPTATION; LANGUAGE MODEL; CLASSIFICATION; COMPENSATION; PARAMETERS;
D O I
暂无
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
In this paper we present an experimental study exploiting structural Bayesian adaptation for handling potential mismatches between training and test conditions for real-world applications to be realized in our multilingual very large vocabulary speech recognition (VLVSR) system project sponsored by MOTIE (The Ministry of Trade, Industry and Energy), Republic of Korea. The goal of the project is to construct a national-wide VLVSR cloud service platform for mobile applications. Besides system architecture design issues, at such a large scale, performance robustness problems, caused by mismatches in speakers, tasks, environments, and domains, etc., need to be taken into account very carefully as well. We decide to adopt adaptation, especially the structural MAP, techniques to reduce system accuracy degradation caused by these mismatches. Being part of an ongoing project, we describe how structural MAP approaches can be used for adaptation of both acoustic and language models for our VLVSR systems, and provide convincing experimental results to demonstrate how adaptation can be utilized to bridge the performance gap between the current state-of-the-art and deployable VLVSR systems.
引用
收藏
页数:10
相关论文
共 13 条
  • [1] MAP Based Speaker Adaptation in Very Large Vocabulary Speech Recognition of Czech
    Cerva, Petr
    Nouza, Jan
    RADIOENGINEERING, 2004, 13 (03) : 42 - 46
  • [2] HANDS-FREE SPEECH RECOGNITION CHALLENGE FOR REAL-WORLD SPEECH DIALOGUE SYSTEMS
    Saruwatari, Hiroshi
    Kawanami, Hiromichi
    Takeuchi, Shota
    Takahashi, Yu
    Cincarek, Tobias
    Shikano, Kiyohiro
    2009 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1- 8, PROCEEDINGS, 2009, : 3729 - 3732
  • [3] Efficient Embedded Speech Recognition for Very Large Vocabulary Mandarin Car-Navigation Systems
    Qian, Yanmin
    Liu, Jia
    Johnson, Michael T.
    IEEE TRANSACTIONS ON CONSUMER ELECTRONICS, 2009, 55 (03) : 1496 - 1500
  • [4] Are you sure? Analysing Uncertainty Quantification Approaches for Real-world Speech Emotion Recognition
    Schruefer, Oliver
    Milling, Manuel
    Burkhardt, Felix
    Eyben, Florian
    Schuller, Bjoern
    INTERSPEECH 2024, 2024, : 3210 - 3214
  • [5] A Study and Experimental Results for Sound Recognition in Real-world Robot Interaction
    Lee, Sang-Rae
    Yoon, Ho-Sub
    Hahn, Moon-Sung
    Chung, Myung-Ae
    2012 9TH INTERNATIONAL CONFERENCE ON UBIQUITOUS ROBOTS AND AMBIENT INTELLIGENCE (URAL), 2012, : 26 - 29
  • [6] Small and Large Vocabulary Speech Recognition of MP3 Data under Real-Word Conditions: Experimental Study
    Pollak, Petr
    Borsky, Michal
    E-BUSINESS AND TELECOMMUNICATIONS, 2012, 314 : 409 - +
  • [7] Introducing a process framework for implementing models of large-scale real-world systems in software
    Andreou, Andreas S.
    Software Process Improvement and Practice, 2004, 9 (03): : 133 - 155
  • [8] Study on Speaker-Independent Emotion Recognition from Speech on Real-World Data
    Kostoulas, Theodoros
    Ganchev, Todor
    Fakotakis, Nikos
    VERBAL AND NONVERBAL FEATURES OF HUMAN-HUMAN AND HUMAN-MACHINE INTERACTIONS, 2008, 5042 : 235 - 242
  • [9] Reverb and Noise as Real-World Effects in Speech Recognition Models: A Study and a Proposal of a Feature Set
    Cesarini, Valerio
    Costantini, Giovanni
    APPLIED SCIENCES-BASEL, 2024, 14 (23):
  • [10] The cafeteria study: Effects of facial masks, hearing protection, and real-world noise on speech recognition
    Barrett, Mary E.
    Gordon-Salant, Sandra
    Brungart, Douglas S.
    JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2021, 150 (06): : 4244 - 4255