A combined cepstral distance method for emotional speech recognition

被引:12
|
作者
Quan, Changqin [1 ]
Zhang, Bin [2 ]
Sun, Xiao [2 ]
Ren, Fuji [3 ]
机构
[1] Kobe Univ, Grad Sch Syst Informat, Kobe, Hyogo, Japan
[2] Hefei Univ Technol, Dept Comp & Informat Sci, Hefei, Peoples R China
[3] Univ Tokushima, Fac Engn, Tokushima, Japan
来源
基金
中国国家自然科学基金;
关键词
Cepstral distance; emotional speech recognition; two-group classification; principal component analysis; SPECTRAL FEATURES; NEURAL-NETWORKS; SCHEME; INFORMATION; PARAMETERS; SIGNALS; MODELS; SET;
D O I
10.1177/1729881417719836
中图分类号
TP24 [机器人技术];
学科分类号
080202 ; 1405 ;
摘要
Affective computing is not only the direction of reform in artificial intelligence but also exemplification of the advanced intelligent machines. Emotion is the biggest difference between human and machine. If the machine behaves with emotion, then the machine will be accepted by more people. Voice is the most natural and can be easily understood and accepted manner in daily communication. The recognition of emotional voice is an important field of artificial intelligence. However, in recognition of emotions, there often exists the phenomenon that two emotions are particularly vulnerable to confusion. This article presents a combined cepstral distance method in two-group multi-class emotion classification for emotional speech recognition. Cepstral distance combined with speech energy is well used as speech signal endpoint detection in speech recognition. In this work, the use of cepstral distance aims to measure the similarity between frames in emotional signals and in neutral signals. These features are input for directed acyclic graph support vector machine classification. Finally, a two-group classification strategy is adopted to solve confusion in multi-emotion recognition. In the experiments, Chinese mandarin emotion database is used and a large training set ( 1134 + 378 utterances) ensures a powerful modelling capability for predicting emotion. The experimental results show that cepstral distance increases the recognition rate of emotion sad and can balance the recognition results with eliminating the over fitting. And for the German corpus Berlin emotional speech database, the recognition rate between sad and boring, which are very difficult to distinguish, is up to 95.45%.
引用
收藏
页码:1 / 9
页数:9
相关论文
共 50 条
  • [1] A WEIGHTED CEPSTRAL DISTANCE MEASURE FOR SPEECH RECOGNITION
    TOHKURA, Y
    [J]. IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1987, 35 (10): : 1414 - 1422
  • [2] Channel Selection for Distant Speech Recognition Exploiting Cepstral Distance
    Guerrero, Cristina
    Tryfou, Georgina
    Omologo, Maurizio
    [J]. 17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 1986 - 1990
  • [3] Cepstral distance based channel selection for distant speech recognition
    Flores, Cristina Guerrero
    Tryfou, Georgina
    Omologo, Maurizio
    [J]. COMPUTER SPEECH AND LANGUAGE, 2018, 47 : 314 - 332
  • [4] Combined Waveform-Cepstral Representation for Robust Speech Recognition
    Ager, Matthew
    Cvetkovic, Zoran
    Sollich, Peter
    [J]. 2011 IEEE INTERNATIONAL SYMPOSIUM ON INFORMATION THEORY PROCEEDINGS (ISIT), 2011, : 864 - 868
  • [5] NONLINEAR CEPSTRAL EQUALIZATION METHOD FOR NOISY SPEECH RECOGNITION
    LEE, LM
    CHEN, JK
    WANG, HC
    [J]. IEE PROCEEDINGS-VISION IMAGE AND SIGNAL PROCESSING, 1994, 141 (06): : 397 - 402
  • [6] Log-index weighted cepstral distance measure for speech recognition
    Zheng Fang
    Wu Wenhu
    Fang Ditang
    [J]. Journal of Computer Science and Technology, 1997, 12 (2) : 177 - 184
  • [7] A Log-Index Weighted Cepstral Distance Measure for Speech Recognition
    郑方
    吴文虎
    方棣棠
    [J]. Journal of Computer Science & Technology, 1997, (02) : 177 - 184
  • [8] A Cepstral PDF Normalization Method for Noise Robust Speech Recognition
    Suk, Yong Ho
    Choi, Seung Ho
    [J]. ADVANCES IN COMPUTER SCIENCE, ENVIRONMENT, ECOINFORMATICS, AND EDUCATION, PT II, 2011, 215 : 34 - +
  • [9] Cepstral Distance and Log Energy Based Silence Feature Normalization for Robust Speech Recognition
    Shen, Guanghu
    Chung, Hyun-Yeol
    [J]. JOURNAL OF THE ACOUSTICAL SOCIETY OF KOREA, 2010, 29 (04): : 278 - 285
  • [10] Emotional Speech Recognition Based on Weighted Distance Optimization System
    ElBedwehy, Mona Nagy
    Behery, G. M.
    Elbarougy, Reda
    [J]. INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE, 2020, 34 (11)