Speech emotion recognition based on multi-feature speed rate and LSTM

被引：4

作者：

Yang, Zijun ^{[1
]}

Li, Zhen ^{[1
]}

Zhou, Shi ^{[2
]}

Zhang, Lifeng ^{[1
]}

Serikawa, Seiichi ^{[1
]}

机构：

[1] Kyushu Inst Technol, 1-1 Sensuicho,Tobata Ward, Kitakyushu, Fukuoka 8040011, Japan

[2] Huzhou Univ, 759,East 2nd Rd, Huzhou 313000, Zhejiang, Peoples R China

来源：

NEUROCOMPUTING | 2024年 / 601卷

关键词：

Speech emotion recognition; LSTM; Voiced sound; Phonogram; Short-time features; DEPRESSION; SEVERITY; SIGNALS;

D O I：

10.1016/j.neucom.2024.128177

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Correctly recognizing speech emotions is of significant importance in various fields, such as healthcare and human-computer interaction (HCI). However, the complexity of speech signal features poses challenges for speech emotion recognition. This study introduces a novel multi-feature method for speech emotion recognition that combines short-and rhythmic features. Utilizing short-time energy, zero-crossing rate, and average amplitude difference, the proposed approach effectively addressed overfitting concerns by reducing feature dimensionality. Employing an (LSTM) network, the experiment achieved notable accuracy across diverse datasets. Specifically, the proposed method achieved an impressive accuracy of up to 98.47% on the CASIA dataset, 100% on the Emo-DB dataset, and 98.87% on the EMOVO dataset, demonstrating its capability to accurately discern speaker emotions across different languages and emotion classes. These findings underscore the significance of incorporating speech rate for emotional content recognition, which holds promise for application in HCI and auxiliary medical diagnostics.

引用

页数：12

共 50 条

[21] Multi-modal feature fusion based on multi-layers LSTM for video emotion recognition
Weizhi Nie
Yan Yan
Dan Song
Kun Wang
Multimedia Tools and Applications, 2021, 80 : 16205 - 16214
[22] Multi-modal feature fusion based on multi-layers LSTM for video emotion recognition
Nie, Weizhi
Yan, Yan
Song, Dan
Wang, Kun
MULTIMEDIA TOOLS AND APPLICATIONS, 2021, 80 (11) : 16205 - 16214
[23] EEG FEATURE EXTRACTION AND RECOGNITION BASED ON MULTI-FEATURE FUSION
Sun, Jian
Wu, Quanyu
Gao, Nan
Pan, Lingjiao
Tao, Weige
BIOMEDICAL ENGINEERING-APPLICATIONS BASIS COMMUNICATIONS, 2024, 36 (06):
[24] Feature-Enhanced Multi-Task Learning for Speech Emotion Recognition Using Decision Trees and LSTM
Wang, Chun
Shen, Xizhong
ELECTRONICS, 2024, 13 (14)
[25] Attention-Based Dense LSTM for Speech Emotion Recognition
Xie, Yue
Liang, Ruiyu
Liang, Zhenlin
Zhao, Li
IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2019, E102D (07): : 1426 - 1429
[26] Siamese Attention-Based LSTM for Speech Emotion Recognition
Nizamidin, Tashpolat
Zhao, Li
Liang, Ruiyu
Xie, Yue
Hamdulla, Askar
IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES, 2020, E103A (07) : 937 - 941
[27] Speech emotion recognition based on multi-dimensional feature extraction and multi-scale feature fusion
Yu, Lingli
Xu, Fengjun
Qu, Yundong
Zhou, Kaijun
APPLIED ACOUSTICS, 2024, 216
[28] Speech Emotion Recognition Based on Speech Segment Using LSTM with Attention Model
Atmaja, Bagus Tris
Akagi, Masato
2019 IEEE INTERNATIONAL CONFERENCE ON SIGNALS AND SYSTEMS (ICSIGSYS), 2019, : 40 - 44
[29] Intelligent Recognition of Fatigue and Sleepiness Based on InceptionV3-LSTM via Multi-Feature Fusion
Zhao, Yifei
Xie, Kai
Zou, Zizhuang
He, Jian-Biao
IEEE ACCESS, 2020, 8 : 144205 - 144217
[30] Speech Emotion Recognition based on Multiple Feature Fusion
Jiang, Changjiang
Mao, Rong
Liu, Geng
Wang, Mingyi
2019 CHINESE AUTOMATION CONGRESS (CAC2019), 2019, : 907 - 912

← 1 2 3 4 5 →