Multilingual phone recognition of spontaneous telephone speech

被引:0
|
作者
Corredor-Ardoy, C [1 ]
Lamel, L [1 ]
Adda-Decker, M [1 ]
Gauvain, JL [1 ]
机构
[1] BOUYGUES TELECOM, F-78944 Velizy, France
关键词
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
In this paper we report on experiments with phone recognition of spontaneous telephone speech. Phone recognizers were trained and assessed on IDEAL, a multilingual corpus containing telephone speech in French, British English, German and Castillan Spanish. We investigated the influence of the training material composition (size and linguistic content) on the recognition performance using context-independent Hidden Markov Models and phonotactic bi-gram models. We found that when testing on spontaneous speech data, using only spontaneous speech training data gave the highest phone accuracies for the four languages, even though this data comprises only 14% of the available training data. The use of context-dependent HMMs reduced the phone error across the 4 languages, with the average error reduced to 51.9% from the 57.4% obtained with CZ models. We suggest a straightforward way of detecting non speech phenomena. The basic idea is to remove sequences of consonants between two silence labels from the recognized phone strings prior to scoring. This simple technique reduces the relative average phone error rate by 5.4%. The lowest phone error with CD models and filtering was obtained for Spanish (39.1%) with 4 language average being 49.1%.
引用
收藏
页码:413 / 416
页数:4
相关论文
共 50 条
  • [41] Recognition of conversational telephone speech using the JANUS speech engine
    Zeppenfeld, T
    Finke, M
    Ries, K
    Westphal, M
    Waibel, A
    1997 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I - V: VOL I: PLENARY, EXPERT SUMMARIES, SPECIAL, AUDIO, UNDERWATER ACOUSTICS, VLSI; VOL II: SPEECH PROCESSING; VOL III: SPEECH PROCESSING, DIGITAL SIGNAL PROCESSING; VOL IV: MULTIDIMENSIONAL SIGNAL PROCESSING, NEURAL NETWORKS - VOL V: STATISTICAL SIGNAL AND ARRAY PROCESSING, APPLICATIONS, 1997, : 1815 - 1818
  • [42] Robust speech detection method for telephone speech recognition system
    ATR Interpreting Telecommunications, Research Lab, Kyoto, Japan
    Speech Commun, 2 (135-148):
  • [43] Progress on Mandarin conversational telephone speech recognition
    Hwang, MY
    Lei, X
    Ng, T
    Bulyko, I
    Ostendorf, M
    Stolcke, A
    Wang, W
    Zheng, J
    Gadde, VRR
    Graciarena, M
    Siu, MH
    Huang, Y
    2004 International Symposium on Chinese Spoken Language Processing, Proceedings, 2004, : 1 - 4
  • [44] Estimation of channel bias for telephone speech recognition
    Chien, JT
    Wang, HC
    Lee, LM
    ICSLP 96 - FOURTH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, VOLS 1-4, 1996, : 1840 - 1843
  • [45] Deconvolution of telephone line effects for speech recognition
    Mokbel, C
    Jouvet, D
    Monne, J
    SPEECH COMMUNICATION, 1996, 19 (03) : 185 - 196
  • [46] Spontaneous Thai Speech Recognition
    Woszczyna, Monika
    Charoenpornsawat, Paisarn
    Schultz, Tanja
    INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 1882 - 1885
  • [47] Improving phoneme recognition of telephone quality speech
    Huang, Q
    Cox, S
    2004 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL I, PROCEEDINGS: SPEECH PROCESSING, 2004, : 445 - 448
  • [48] An automatic telephone operator using speech recognition
    Zhou, GJ
    Zeng, LG
    Feng, CX
    1996 INTERNATIONAL CONFERENCE ON COMMUNICATION TECHNOLOGY, VOLUMES 1 AND 2 - PROCEEDINGS, 1996, : 420 - 423
  • [49] RECOGNITION OF AUDIBLE TONES AND SPEECH ON A TELEPHONE LINE
    EVERS, R
    MEYENBERG, E
    NACHRICHTENTECHNISCHE ZEITSCHRIFT, 1971, 24 (10): : 536 - +
  • [50] Improving English Conversational Telephone Speech Recognition
    Medennikov, Ivan
    Prudnikov, Alexey
    Zatvornitskiy, Alexander
    17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 2 - 6