Study of relationships between intra-speaker's speech variability and speech recognition performance

被引:0
|
作者
Tsuge, Satoru [1 ]
Fukumi, Minoru [1 ]
Shishibori, Masami [1 ]
Ren, Fuji [1 ]
Kita, Kenji [1 ]
Kuroiwa, Shingo [1 ]
机构
[1] Univ Tokushima, 2-1 Minami Josanjima, Tokushima 7708506, Japan
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Even if a speaker uses a speaker-dependent speech recognition system, speech recognition performance varies. For this reason, speech quality is varied by some factors, which are including emotion, background noise, and so on, even though the speaker and utterance remain constant. However, the relationships between intra-speaker's speech variability and speech recognition performance are not clear. Hence, we focus on the intra-speaker's speech variability which affects the speech recognition performances. To investigate these relationships, we have been collecting speech data since November 2002. Using a part of the speech corpus, we conducted speech recognition experiments. In this paper, we analyze the relationships between intra-speaker's speech variability and the phoneme accuracy by using the correlation analysis. For factors of the correlation analysis, we use a number of errors, a speaking rate, a likelihood. Analysis results show a strong correlation between the number of the substitution errors and the phoneme accuracy although the correlations of the number of the deletion and the insertion errors are low. Therefore, it is considered that there are overlaps between phonemes since the feature parameters vary at each speaking rate. For improving the phoneme accuracy, it is needed that we study a method which discriminates phonemes. On the other hand, although the correlation between the phoneme accuracy and the speaking rate seems to be low, a strong correlation between the speaking rate and the number of deletion errors and insertion errors are found. Since the number of the insertion errors and the number of the deletion errors were in the counterbalance relation, the correlation between the speaking rate and the phoneme accuracy was low. However, we consider that it is needed to normalize the speaking rate because the speaking rate influences on the number of the deletion and the insertion errors.
引用
收藏
页码:33 / +
页数:2
相关论文
共 50 条
  • [31] A STUDY OF SPEAKER VERIFICATION PERFORMANCE WITH EXPRESSIVE SPEECH
    Parthasarathy, Srinivas
    Zhang, Chunlei
    Hansen, John H. L.
    Busso, Carlos
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 5540 - 5544
  • [32] A Study of Intra-Speaker and Inter-Speaker Affective Variability using Electroglottograph and Inverse Filtered Glottal Waveforms
    Bone, Daniel
    Kim, Samuel
    Lee, Sungbok
    Narayanan, Shrikanth S.
    [J]. 11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 1-2, 2010, : 913 - 916
  • [33] Speech recognition as feature extraction for speaker recognition
    Stolcke, A.
    Shriberg, E.
    Ferrer, L.
    Kajarekar, S.
    Sonmez, K.
    Tur, G.
    [J]. 2007 IEEE WORKSHOP ON SIGNAL PROCESSING APPLICATIONS FOR PUBLIC SECURITY AND FORENSICS, 2007, : 39 - +
  • [34] A Study on the Search of the Most Discriminative Speech Features in the Speaker Dependent Speech Emotion Recognition
    Pao, Tsang-Long
    Wang, Chun-Hsiang
    Li, Yu-Ji
    [J]. 2012 FIFTH INTERNATIONAL SYMPOSIUM ON PARALLEL ARCHITECTURES, ALGORITHMS AND PROGRAMMING (PAAP), 2012, : 157 - 162
  • [35] An evaluation of visual speech features for the tasks of speech and speaker recognition
    Lucey, S
    [J]. AUDIO-BASED AND VIDEO-BASED BIOMETRIC PERSON AUTHENTICATION, PROCEEDINGS, 2003, 2688 : 260 - 267
  • [36] Automatic speech recognition and speech variability: A review
    Benzeghiba, M.
    De Mori, R.
    Deroo, O.
    Dupont, S.
    Erbes, T.
    Jouvet, D.
    Fissore, L.
    Laface, P.
    Mertins, A.
    Ris, C.
    Rose, R.
    Tyagi, V.
    Wellekens, C.
    [J]. SPEECH COMMUNICATION, 2007, 49 (10-11) : 763 - 786
  • [37] SPEECH AND SPEAKER RECOGNITION - SCHROEDER,MR
    HOLMES, JN
    [J]. JOURNAL OF PHONETICS, 1985, 13 (03) : 359 - 362
  • [38] Multisource Speech Analysis for Speaker Recognition
    Sorokin, V. N.
    Leonov, A. S.
    [J]. PATTERN RECOGNITION AND IMAGE ANALYSIS, 2019, 29 (01) : 181 - 193
  • [39] PREDICTIVE SPEAKER ADAPTATION IN SPEECH RECOGNITION
    COX, S
    [J]. COMPUTER SPEECH AND LANGUAGE, 1995, 9 (01): : 1 - 17
  • [40] SPEECH AND SPEAKER RECOGNITION - SCHROEDER,MR
    NOLAN, FJ
    [J]. LINGUISTICS, 1986, 24 (04) : 833 - 836