A Lightweight Model Based on Separable Convolution for Speech Emotion Recognition

被引:28
|
作者
Zhong, Ying [1 ,2 ]
Hu, Ying [1 ,2 ]
Huang, Hao [1 ,3 ]
Silamu, Wushour [1 ,3 ]
机构
[1] Xinjiang Univ, Sch Informat Sci & Engn, Urumqi, Peoples R China
[2] Key Lab Signal Detect & Proc Xinjiang Uygur Auton, Urumqi, Peoples R China
[3] Key Lab Multilingual Informat Technol Xinjiang Uy, Urumqi, Peoples R China
来源
基金
中国国家自然科学基金;
关键词
Speech emotion recognition; lightweight; inverted residuals; focal loss;
D O I
10.21437/Interspeech.2020-2408
中图分类号
R36 [病理学]; R76 [耳鼻咽喉科学];
学科分类号
100104 ; 100213 ;
摘要
One of the major challenges in Speech Emotion Recognition (SER) is to build a lightweight model with limited training data. In this paper, we propose a lightweight architecture with only fewer parameters which is based on separable convolution and inverted residuals. Speech samples are often annotated by multiple raters. While some sentences with clear emotional content are consistently annotated (easy samples), sentences with ambiguous emotional content present important disagreement between individual evaluations (hard samples). We assumed that samples hard for humans are also hard for computers. We address the problem by using focal loss, which focus on learning hard samples and down-weight easy samples. By combining attention mechanism, our proposed network can enhance the importing of emotion-salient information. Our proposed model achieves 71.72% and 90.1% of unweighted accuracy (UA) on the well-known corpora IEMOCAP and Emo-DB respectively. Comparing with the current model having fewest parameters as we know, its model size is almost 5 times of our proposed model.
引用
收藏
页码:3331 / 3335
页数:5
相关论文
共 50 条
  • [1] A Lightweight Multi-Scale Model for Speech Emotion Recognition
    Li, Haoming
    Zhao, Daqi
    Wang, Jingwen
    Wang, Deqiang
    [J]. IEEE ACCESS, 2024, 12 : 130228 - 130240
  • [2] A lightweight face recognition method based on depthwise separable convolution and triplet loss
    Yan, Wenyang
    Liu, Taiting
    Liu, Shuaishi
    Geng, Yining
    Sun, Zhongbo
    [J]. PROCEEDINGS OF THE 39TH CHINESE CONTROL CONFERENCE, 2020, : 7570 - 7575
  • [3] Falcon: lightweight and accurate convolution based on depthwise separable convolution
    Jang, Jun-Gi
    Quan, Chun
    Lee, Hyun Dong
    Kang, U.
    [J]. KNOWLEDGE AND INFORMATION SYSTEMS, 2023, 65 (05) : 2225 - 2249
  • [4] Falcon: lightweight and accurate convolution based on depthwise separable convolution
    Jun-Gi Jang
    Chun Quan
    Hyun Dong Lee
    U. Kang
    [J]. Knowledge and Information Systems, 2023, 65 : 2225 - 2249
  • [5] A Speech Emotion Recognition Method Based on Lightweight Capsule Network
    Wang, Ying
    Gao, Sheng
    [J]. Dianzi Keji Daxue Xuebao/Journal of the University of Electronic Science and Technology of China, 2023, 52 (03): : 423 - 429
  • [6] Electroencephalogram-based emotion recognition using factorization temporal separable convolution network
    Yang, Lijun
    Wang, Yixin
    Ouyang, Rujie
    Niu, Xiaolong
    Yang, Xiaohui
    Zheng, Chen
    [J]. ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2024, 133
  • [7] Speech Emotion Recognition Based on Convolution Neural Network combined with Random Forest
    Zheng, Li
    Li, Qiao
    Ban, Hua
    Liu, Shuhua
    [J]. PROCEEDINGS OF THE 30TH CHINESE CONTROL AND DECISION CONFERENCE (2018 CCDC), 2018, : 4143 - 4147
  • [8] Research on real-time interaction for the emotion recognition robot based on depthwise separable convolution
    Xu, Guizhi
    Zhao, Yang
    Guo, Miaomiao
    Jin, Ming
    [J]. Yi Qi Yi Biao Xue Bao/Chinese Journal of Scientific Instrument, 2019, 40 (10): : 161 - 168
  • [9] Empirical Interpretation of Speech Emotion Perception with Attention Based Model for Speech Emotion Recognition
    Jalal, Md Asif
    Milner, Rosanna
    Hain, Thomas
    [J]. INTERSPEECH 2020, 2020, : 4113 - 4117
  • [10] Lightweight Deep Learning Framework for Speech Emotion Recognition
    Akinpelu, Samson
    Viriri, Serestina
    Adegun, Adekanmi
    [J]. IEEE ACCESS, 2023, 11 : 77086 - 77098