A Lightweight Model Based on Separable Convolution for Speech Emotion Recognition

被引:28
|
作者
Zhong, Ying [1 ,2 ]
Hu, Ying [1 ,2 ]
Huang, Hao [1 ,3 ]
Silamu, Wushour [1 ,3 ]
机构
[1] Xinjiang Univ, Sch Informat Sci & Engn, Urumqi, Peoples R China
[2] Key Lab Signal Detect & Proc Xinjiang Uygur Auton, Urumqi, Peoples R China
[3] Key Lab Multilingual Informat Technol Xinjiang Uy, Urumqi, Peoples R China
来源
基金
中国国家自然科学基金;
关键词
Speech emotion recognition; lightweight; inverted residuals; focal loss;
D O I
10.21437/Interspeech.2020-2408
中图分类号
R36 [病理学]; R76 [耳鼻咽喉科学];
学科分类号
100104 ; 100213 ;
摘要
One of the major challenges in Speech Emotion Recognition (SER) is to build a lightweight model with limited training data. In this paper, we propose a lightweight architecture with only fewer parameters which is based on separable convolution and inverted residuals. Speech samples are often annotated by multiple raters. While some sentences with clear emotional content are consistently annotated (easy samples), sentences with ambiguous emotional content present important disagreement between individual evaluations (hard samples). We assumed that samples hard for humans are also hard for computers. We address the problem by using focal loss, which focus on learning hard samples and down-weight easy samples. By combining attention mechanism, our proposed network can enhance the importing of emotion-salient information. Our proposed model achieves 71.72% and 90.1% of unweighted accuracy (UA) on the well-known corpora IEMOCAP and Emo-DB respectively. Comparing with the current model having fewest parameters as we know, its model size is almost 5 times of our proposed model.
引用
收藏
页码:3331 / 3335
页数:5
相关论文
共 50 条
  • [21] English speech emotion recognition method based on speech recognition
    Liu, Man
    [J]. INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY, 2022, 25 (2) : 391 - 398
  • [22] English speech emotion recognition method based on speech recognition
    Man Liu
    [J]. International Journal of Speech Technology, 2022, 25 : 391 - 398
  • [23] Speech Emotion Recognition Based on Wavelet Packet Coefficient Model
    Wang, Kunxia
    An, Ning
    Li, Lian
    [J]. 2014 9TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2014, : 478 - 482
  • [24] Deep learning based Affective Model for Speech Emotion Recognition
    Zhou, Xi
    Guo, Junqi
    Bie, Rongfang
    [J]. 2016 INT IEEE CONFERENCES ON UBIQUITOUS INTELLIGENCE & COMPUTING, ADVANCED & TRUSTED COMPUTING, SCALABLE COMPUTING AND COMMUNICATIONS, CLOUD AND BIG DATA COMPUTING, INTERNET OF PEOPLE, AND SMART WORLD CONGRESS (UIC/ATC/SCALCOM/CBDCOM/IOP/SMARTWORLD), 2016, : 841 - 846
  • [25] Hidden Markov model-based speech emotion recognition
    Schuller, B
    Rigoll, G
    Lang, M
    [J]. 2003 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL II, PROCEEDINGS: SPEECH II; INDUSTRY TECHNOLOGY TRACKS; DESIGN & IMPLEMENTATION OF SIGNAL PROCESSING SYSTEMS; NEURAL NETWORKS FOR SIGNAL PROCESSING, 2003, : 1 - 4
  • [26] Uncertainty-Based Learning of a Lightweight Model for Multimodal Emotion Recognition
    Radoi, Anamaria
    Cioroiu, George
    [J]. IEEE ACCESS, 2024, 12 : 120362 - 120374
  • [27] Hidden Markov model-based speech emotion recognition
    Schuller, B
    Rigoll, G
    Lang, M
    [J]. 2003 INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, VOL I, PROCEEDINGS, 2003, : 401 - 404
  • [28] Research and Implementation of Speech Emotion Recognition Based on CGRU Model
    Zheng, Yan
    Chen, Jia-Nan
    Wu, Fan
    Fu, Bin
    [J]. Dongbei Daxue Xuebao/Journal of Northeastern University, 2020, 41 (12): : 1680 - 1685
  • [29] The Research of Speech Emotion Recognition Based on Gaussian Mixture Model
    Zhang, Wanli
    Li, Guoxin
    Gao, Wei
    [J]. MECHANICAL COMPONENTS AND CONTROL ENGINEERING III, 2014, 668-669 : 1126 - +
  • [30] Depthwise Separable Convolution based Lightweight HSRRS Image Classification Method
    Luo, Wang
    Li, Tong
    Yang, Weidong
    Yu, Tongwei
    Xi, Dingding
    Shen, Li
    Xia, Yuan
    Yang, Zhibin
    Xu, Huarong
    [J]. 2020 12TH INTERNATIONAL CONFERENCE ON WIRELESS COMMUNICATIONS AND SIGNAL PROCESSING (WCSP), 2020, : 586 - 590