A combined cepstral distance method for emotional speech recognition

被引:12
|
作者
Quan, Changqin [1 ]
Zhang, Bin [2 ]
Sun, Xiao [2 ]
Ren, Fuji [3 ]
机构
[1] Kobe Univ, Grad Sch Syst Informat, Kobe, Hyogo, Japan
[2] Hefei Univ Technol, Dept Comp & Informat Sci, Hefei, Peoples R China
[3] Univ Tokushima, Fac Engn, Tokushima, Japan
来源
INTERNATIONAL JOURNAL OF ADVANCED ROBOTIC SYSTEMS | 2017年 / 14卷 / 04期
基金
中国国家自然科学基金;
关键词
Cepstral distance; emotional speech recognition; two-group classification; principal component analysis; SPECTRAL FEATURES; NEURAL-NETWORKS; SCHEME; INFORMATION; PARAMETERS; SIGNALS; MODELS; SET;
D O I
10.1177/1729881417719836
中图分类号
TP24 [机器人技术];
学科分类号
080202 ; 1405 ;
摘要
Affective computing is not only the direction of reform in artificial intelligence but also exemplification of the advanced intelligent machines. Emotion is the biggest difference between human and machine. If the machine behaves with emotion, then the machine will be accepted by more people. Voice is the most natural and can be easily understood and accepted manner in daily communication. The recognition of emotional voice is an important field of artificial intelligence. However, in recognition of emotions, there often exists the phenomenon that two emotions are particularly vulnerable to confusion. This article presents a combined cepstral distance method in two-group multi-class emotion classification for emotional speech recognition. Cepstral distance combined with speech energy is well used as speech signal endpoint detection in speech recognition. In this work, the use of cepstral distance aims to measure the similarity between frames in emotional signals and in neutral signals. These features are input for directed acyclic graph support vector machine classification. Finally, a two-group classification strategy is adopted to solve confusion in multi-emotion recognition. In the experiments, Chinese mandarin emotion database is used and a large training set ( 1134 + 378 utterances) ensures a powerful modelling capability for predicting emotion. The experimental results show that cepstral distance increases the recognition rate of emotion sad and can balance the recognition results with eliminating the over fitting. And for the German corpus Berlin emotional speech database, the recognition rate between sad and boring, which are very difficult to distinguish, is up to 95.45%.
引用
收藏
页码:1 / 9
页数:9
相关论文
共 50 条
  • [31] Emotional speech recognition based on modified parameter and distance of statistical model of pitch
    Department of Radio Engineering, Southeast University, Nanjing 210096, China
    Shengxue Xuebao, 2006, 1 (28-34):
  • [32] Speech recognition system using enhanced mel frequency cepstral coefficient with windowing and framing method
    S. Lokesh
    M. Ramya Devi
    Cluster Computing, 2019, 22 : 11669 - 11679
  • [33] Speech recognition system using enhanced mel frequency cepstral coefficient with windowing and framing method
    Lokesh, S.
    Devi, M. Ramya
    CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS, 2019, 22 (Suppl 5): : 11669 - 11679
  • [34] Linear Frequency Residual Cepstral Coefficients for Speech Emotion Recognition
    Hora, Baveet Singh
    Uthiraa, S.
    Patil, Hemant A.
    SPEECH AND COMPUTER, SPECOM 2023, PT I, 2023, 14338 : 116 - 129
  • [35] NMF-based Cepstral Features for Speech Emotion Recognition
    Lashkari, Milad
    Seyedin, Sanaz
    2018 4TH IRANIAN CONFERENCE ON SIGNAL PROCESSING AND INTELLIGENT SYSTEMS (ICSPIS), 2018, : 189 - 193
  • [36] A Combined Speaker Adaptation Method for Mandarin Speech Recognition
    徐向华
    朱杰
    Journal of Shanghai Jiaotong University, 2004, (04) : 21 - 24
  • [37] Whispered Speech Recognition Based on Gammatone Filterbank Cepstral Coefficients
    Markovic, B.
    Galic, J.
    Grozdic, D.
    Jovicic, S. T.
    Mijic, M.
    JOURNAL OF COMMUNICATIONS TECHNOLOGY AND ELECTRONICS, 2017, 62 (11) : 1255 - 1261
  • [38] Bounded cepstral marginalization of missing data for robust speech recognition
    Kafoori, Kian Ebrahim
    Ahadi, Seyed Mohammad
    COMPUTER SPEECH AND LANGUAGE, 2016, 36 : 1 - 23
  • [39] Cepstral amplitude range normalization for noise robust speech recognition
    Yoshizawa, S
    Hayasaka, N
    Wada, N
    Miyanaga, Y
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2004, E87D (08): : 2130 - 2137
  • [40] Intelligibility Rating with Automatic Speech Recognition, Prosodic, and Cepstral Evaluation
    Haderlein, Tino
    Moers, Cornelia
    Moebius, Bernd
    Rosanowski, Frank
    Noeth, Elmar
    TEXT, SPEECH AND DIALOGUE, TSD 2011, 2011, 6836 : 195 - 202