Speech emotion recognition based on multi-feature speed rate and LSTM

被引:4
|
作者
Yang, Zijun [1 ]
Li, Zhen [1 ]
Zhou, Shi [2 ]
Zhang, Lifeng [1 ]
Serikawa, Seiichi [1 ]
机构
[1] Kyushu Inst Technol, 1-1 Sensuicho,Tobata Ward, Kitakyushu, Fukuoka 8040011, Japan
[2] Huzhou Univ, 759,East 2nd Rd, Huzhou 313000, Zhejiang, Peoples R China
关键词
Speech emotion recognition; LSTM; Voiced sound; Phonogram; Short-time features; DEPRESSION; SEVERITY; SIGNALS;
D O I
10.1016/j.neucom.2024.128177
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Correctly recognizing speech emotions is of significant importance in various fields, such as healthcare and human-computer interaction (HCI). However, the complexity of speech signal features poses challenges for speech emotion recognition. This study introduces a novel multi-feature method for speech emotion recognition that combines short-and rhythmic features. Utilizing short-time energy, zero-crossing rate, and average amplitude difference, the proposed approach effectively addressed overfitting concerns by reducing feature dimensionality. Employing an (LSTM) network, the experiment achieved notable accuracy across diverse datasets. Specifically, the proposed method achieved an impressive accuracy of up to 98.47% on the CASIA dataset, 100% on the Emo-DB dataset, and 98.87% on the EMOVO dataset, demonstrating its capability to accurately discern speaker emotions across different languages and emotion classes. These findings underscore the significance of incorporating speech rate for emotional content recognition, which holds promise for application in HCI and auxiliary medical diagnostics.
引用
收藏
页数:12
相关论文
共 50 条
  • [21] Multi-modal feature fusion based on multi-layers LSTM for video emotion recognition
    Weizhi Nie
    Yan Yan
    Dan Song
    Kun Wang
    Multimedia Tools and Applications, 2021, 80 : 16205 - 16214
  • [22] Multi-modal feature fusion based on multi-layers LSTM for video emotion recognition
    Nie, Weizhi
    Yan, Yan
    Song, Dan
    Wang, Kun
    MULTIMEDIA TOOLS AND APPLICATIONS, 2021, 80 (11) : 16205 - 16214
  • [23] EEG FEATURE EXTRACTION AND RECOGNITION BASED ON MULTI-FEATURE FUSION
    Sun, Jian
    Wu, Quanyu
    Gao, Nan
    Pan, Lingjiao
    Tao, Weige
    BIOMEDICAL ENGINEERING-APPLICATIONS BASIS COMMUNICATIONS, 2024, 36 (06):
  • [24] Feature-Enhanced Multi-Task Learning for Speech Emotion Recognition Using Decision Trees and LSTM
    Wang, Chun
    Shen, Xizhong
    ELECTRONICS, 2024, 13 (14)
  • [25] Attention-Based Dense LSTM for Speech Emotion Recognition
    Xie, Yue
    Liang, Ruiyu
    Liang, Zhenlin
    Zhao, Li
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2019, E102D (07): : 1426 - 1429
  • [26] Siamese Attention-Based LSTM for Speech Emotion Recognition
    Nizamidin, Tashpolat
    Zhao, Li
    Liang, Ruiyu
    Xie, Yue
    Hamdulla, Askar
    IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES, 2020, E103A (07) : 937 - 941
  • [27] Speech emotion recognition based on multi-dimensional feature extraction and multi-scale feature fusion
    Yu, Lingli
    Xu, Fengjun
    Qu, Yundong
    Zhou, Kaijun
    APPLIED ACOUSTICS, 2024, 216
  • [28] Speech Emotion Recognition Based on Speech Segment Using LSTM with Attention Model
    Atmaja, Bagus Tris
    Akagi, Masato
    2019 IEEE INTERNATIONAL CONFERENCE ON SIGNALS AND SYSTEMS (ICSIGSYS), 2019, : 40 - 44
  • [29] Intelligent Recognition of Fatigue and Sleepiness Based on InceptionV3-LSTM via Multi-Feature Fusion
    Zhao, Yifei
    Xie, Kai
    Zou, Zizhuang
    He, Jian-Biao
    IEEE ACCESS, 2020, 8 : 144205 - 144217
  • [30] Speech Emotion Recognition based on Multiple Feature Fusion
    Jiang, Changjiang
    Mao, Rong
    Liu, Geng
    Wang, Mingyi
    2019 CHINESE AUTOMATION CONGRESS (CAC2019), 2019, : 907 - 912