Multilingual phone recognition of spontaneous telephone speech

被引:0
|
作者
Corredor-Ardoy, C [1 ]
Lamel, L [1 ]
Adda-Decker, M [1 ]
Gauvain, JL [1 ]
机构
[1] BOUYGUES TELECOM, F-78944 Velizy, France
关键词
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
In this paper we report on experiments with phone recognition of spontaneous telephone speech. Phone recognizers were trained and assessed on IDEAL, a multilingual corpus containing telephone speech in French, British English, German and Castillan Spanish. We investigated the influence of the training material composition (size and linguistic content) on the recognition performance using context-independent Hidden Markov Models and phonotactic bi-gram models. We found that when testing on spontaneous speech data, using only spontaneous speech training data gave the highest phone accuracies for the four languages, even though this data comprises only 14% of the available training data. The use of context-dependent HMMs reduced the phone error across the 4 languages, with the average error reduced to 51.9% from the 57.4% obtained with CZ models. We suggest a straightforward way of detecting non speech phenomena. The basic idea is to remove sequences of consonants between two silence labels from the recognized phone strings prior to scoring. This simple technique reduces the relative average phone error rate by 5.4%. The lowest phone error with CD models and filtering was obtained for Spanish (39.1%) with 4 language average being 49.1%.
引用
收藏
页码:413 / 416
页数:4
相关论文
共 50 条
  • [1] Multilingual phone models for vocabulary-independent speech recognition tasks
    Köhler, J
    [J]. SPEECH COMMUNICATION, 2001, 35 (1-2) : 21 - 30
  • [2] Phone set generation based on acoustic and contextual analysis for multilingual speech recognition
    Huang, Chien-Lin
    Wu, Chung-Hsien
    [J]. 2007 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL IV, PTS 1-3, 2007, : 1017 - +
  • [3] Automatic recognition of spontaneous speech for access to multilingual oral history archives
    Byrne, W
    Doermann, D
    Franz, MT
    Gustman, S
    Hajic, J
    Oard, D
    Picheny, M
    Psutka, J
    Ramabhadran, B
    Soergel, D
    Ward, T
    Zhu, WJ
    [J]. IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2004, 12 (04): : 420 - 435
  • [4] Three approaches to multilingual phone recognition
    Wong, E
    Sridharan, S
    [J]. 2003 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL I, PROCEEDINGS: SPEECH PROCESSING I, 2003, : 44 - 47
  • [5] MULTILINGUAL AND CROSSLINGUAL SPEECH RECOGNITION USING PHONOLOGICAL-VECTOR BASED PHONE EMBEDDINGS
    Zhu, Chengrui
    An, Keyu
    Zheng, Huahuan
    Ou, Zhijian
    [J]. 2021 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), 2021, : 1034 - 1041
  • [6] Phone-dependent channel compensated hidden Markov model for telephone speech recognition
    Chien, JT
    Wang, HC
    [J]. IEEE SIGNAL PROCESSING LETTERS, 1998, 5 (06) : 143 - 145
  • [7] UNIVERSAL PHONE RECOGNITION WITH A MULTILINGUAL ALLOPHONE SYSTEM
    Li, Xinjian
    Dalmia, Siddharth
    Li, Juncheng
    Lee, Matthew
    Littell, Patrick
    Yao, Jiali
    Anastasopoulos, Antonios
    Mortensen, David R.
    Neubig, Graham
    Black, Alan W.
    Metze, Florian
    [J]. 2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 8249 - 8253
  • [8] Conversational telephone speech recognition
    Gauvain, JL
    Lamel, L
    Schwenk, H
    Adda, G
    Chen, L
    Lefèvre, F
    [J]. 2003 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL I, PROCEEDINGS: SPEECH PROCESSING I, 2003, : 212 - 215
  • [9] Extending an Acoustic Data-Driven Phone Set for Spontaneous Speech Recognition
    Bang, Jeong-Uk
    Choi, Mu-Yeol
    Kim, Sang-Hun
    Kwon, Oh-Wook
    [J]. INTERSPEECH 2019, 2019, : 4405 - 4409
  • [10] RECOGNITION OF MULTILINGUAL SPEECH IN MOBILE APPLICATIONS
    Lin, Hui
    Huang, Jui-ting
    Beaufays, Francoise
    Strope, Brian
    Sung, Yun-hsuan
    [J]. 2012 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2012, : 4881 - 4884