A Lightweight Model Based on Separable Convolution for Speech Emotion Recognition

被引：28

作者：

Zhong, Ying ^{[1
,2
]}

Hu, Ying ^{[1
,2
]}

Huang, Hao ^{[1
,3
]}

Silamu, Wushour ^{[1
,3
]}

机构：

[1] Xinjiang Univ, Sch Informat Sci & Engn, Urumqi, Peoples R China

[2] Key Lab Signal Detect & Proc Xinjiang Uygur Auton, Urumqi, Peoples R China

[3] Key Lab Multilingual Informat Technol Xinjiang Uy, Urumqi, Peoples R China

来源：

INTERSPEECH 2020 | 2020年

基金：

中国国家自然科学基金;

关键词：

Speech emotion recognition; lightweight; inverted residuals; focal loss;

D O I：

10.21437/Interspeech.2020-2408

中图分类号：

R36 [病理学]; R76 [耳鼻咽喉科学];

学科分类号：

100104 ; 100213 ;

摘要：

One of the major challenges in Speech Emotion Recognition (SER) is to build a lightweight model with limited training data. In this paper, we propose a lightweight architecture with only fewer parameters which is based on separable convolution and inverted residuals. Speech samples are often annotated by multiple raters. While some sentences with clear emotional content are consistently annotated (easy samples), sentences with ambiguous emotional content present important disagreement between individual evaluations (hard samples). We assumed that samples hard for humans are also hard for computers. We address the problem by using focal loss, which focus on learning hard samples and down-weight easy samples. By combining attention mechanism, our proposed network can enhance the importing of emotion-salient information. Our proposed model achieves 71.72% and 90.1% of unweighted accuracy (UA) on the well-known corpora IEMOCAP and Emo-DB respectively. Comparing with the current model having fewest parameters as we know, its model size is almost 5 times of our proposed model.

引用

页码：3331 / 3335

页数：5

共 50 条

[1] A Lightweight Multi-Scale Model for Speech Emotion Recognition
Li, Haoming
Zhao, Daqi
Wang, Jingwen
Wang, Deqiang
[J]. IEEE ACCESS, 2024, 12 : 130228 - 130240
[2] A lightweight face recognition method based on depthwise separable convolution and triplet loss
Yan, Wenyang
Liu, Taiting
Liu, Shuaishi
Geng, Yining
Sun, Zhongbo
[J]. PROCEEDINGS OF THE 39TH CHINESE CONTROL CONFERENCE, 2020, : 7570 - 7575
[3] Falcon: lightweight and accurate convolution based on depthwise separable convolution
Jang, Jun-Gi
Quan, Chun
Lee, Hyun Dong
Kang, U.
[J]. KNOWLEDGE AND INFORMATION SYSTEMS, 2023, 65 (05) : 2225 - 2249
[4] Falcon: lightweight and accurate convolution based on depthwise separable convolution
Jun-Gi Jang
Chun Quan
Hyun Dong Lee
U. Kang
[J]. Knowledge and Information Systems, 2023, 65 : 2225 - 2249
[5] A Speech Emotion Recognition Method Based on Lightweight Capsule Network
Wang, Ying
Gao, Sheng
[J]. Dianzi Keji Daxue Xuebao/Journal of the University of Electronic Science and Technology of China, 2023, 52 (03): : 423 - 429
[6] Electroencephalogram-based emotion recognition using factorization temporal separable convolution network
Yang, Lijun
Wang, Yixin
Ouyang, Rujie
Niu, Xiaolong
Yang, Xiaohui
Zheng, Chen
[J]. ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2024, 133
[7] Speech Emotion Recognition Based on Convolution Neural Network combined with Random Forest
Zheng, Li
Li, Qiao
Ban, Hua
Liu, Shuhua
[J]. PROCEEDINGS OF THE 30TH CHINESE CONTROL AND DECISION CONFERENCE (2018 CCDC), 2018, : 4143 - 4147
[8] Research on real-time interaction for the emotion recognition robot based on depthwise separable convolution
Xu, Guizhi
Zhao, Yang
Guo, Miaomiao
Jin, Ming
[J]. Yi Qi Yi Biao Xue Bao/Chinese Journal of Scientific Instrument, 2019, 40 (10): : 161 - 168
[9] Empirical Interpretation of Speech Emotion Perception with Attention Based Model for Speech Emotion Recognition
Jalal, Md Asif
Milner, Rosanna
Hain, Thomas
[J]. INTERSPEECH 2020, 2020, : 4113 - 4117
[10] Lightweight Deep Learning Framework for Speech Emotion Recognition
Akinpelu, Samson
Viriri, Serestina
Adegun, Adekanmi
[J]. IEEE ACCESS, 2023, 11 : 77086 - 77098

← 1 2 3 4 5 →