Self-labeling with feature transfer for speech emotion recognition

被引:11
|
作者
Wen, Guihua [1 ]
Liao, Huiqiang [1 ]
Li, Huihui [2 ]
Wen, Pengchen [3 ]
Zhang, Tong [1 ]
Gao, Sande [4 ]
Wang, Bao [4 ]
机构
[1] South China Univ Technol, Sch Comp Sci & Engn, Guangzhou, Peoples R China
[2] Guangdong Polytech Normal Univ, Sch Comp Sci, Guangzhou, Peoples R China
[3] Hubei Minzu Univ, Sch Informat Engn, Enshi, Hubei, Peoples R China
[4] Affiliated TCM Hosp Guangzhou Med Univ, Guangzhou, Peoples R China
基金
中国国家自然科学基金;
关键词
Speech emotion recognition; Deep neural network; Self-labeled; Speech frame; Transfer learning; REPRESENTATION;
D O I
10.1016/j.knosys.2022.109589
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Most speech emotion recognition methods based on frames have obtained good results in many applications. However, they segment each speech sample into smaller frames that are labeled with the same emotional tag as that of the speech sample. This is inconsistent with the possibility of a speech sample containing several emotional categories at the same time. Thus, this paper proposes a self-labeling (SL) learning method for speech emotion recognition, which automatically segments each speech sample into frames and then labels them with the corresponding emotional tags, where the compatibility of these tags is also checked. Then, a time-frequency deep neural network for speech emotion recognition is designed and trained. As most speech emotion datasets are very small, the feature transfer model is applied to further enhance the performance of the SL learning method, which is trained on large-scale audio data. Experimental results on various datasets demonstrate the effectiveness of the proposed method. (C) 2022 Published by Elsevier B.V.
引用
收藏
页数:10
相关论文
共 50 条
  • [41] Speech emotion recognition using a novel feature set
    Yang, J. (jsjyj0801@163.com), 1600, Binary Information Press, P.O. Box 162, Bethel, CT 06801-0162, United States (09):
  • [42] Selective Acoustic Feature Enhancement for Speech Emotion Recognition With Noisy Speech
    Leem, Seong-Gyun
    Fulford, Daniel
    Onnela, Jukka-Pekka
    Gard, David
    Busso, Carlos
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2024, 32 : 917 - 929
  • [43] An algorithm study for speech emotion recognition based speech feature analysis
    Zhengbiao, Ji
    Feng, Zhou
    Ming, Zhu
    International Journal of Multimedia and Ubiquitous Engineering, 2015, 10 (11): : 33 - 42
  • [44] Complex Feature Information Enhanced Speech Emotion Recognition
    Yue, Pengcheng
    Zheng, Shukai
    Li, Taihao
    2023 ASIA PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE, APSIPA ASC, 2023, : 941 - 946
  • [45] A novel feature selection method for speech emotion recognition
    Ozseven, Turgut
    APPLIED ACOUSTICS, 2019, 146 : 320 - 326
  • [46] Speech Emotion Recognition Using Transfer Learning
    Song, Peng
    Jin, Yun
    Zhao, Li
    Xin, Minghai
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2014, E97D (09): : 2530 - 2532
  • [47] Speech Emotion Recognition Incorporating Relative Difficulty and Labeling Reliability
    Ahn, Youngdo
    Han, Sangwook
    Lee, Seonggyu
    Shin, Jong Won
    SENSORS, 2024, 24 (13)
  • [48] Self-attention for Speech Emotion Recognition
    Tarantino, Lorenzo
    Garner, Philip N.
    Lazaridis, Alexandros
    INTERSPEECH 2019, 2019, : 2578 - 2582
  • [49] Linked Source and Target Domain Subspace Feature Transfer Learning - Exemplified by Speech Emotion Recognition
    Deng, Jun
    Zhang, Zixing
    Schuller, Bjoern
    2014 22ND INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2014, : 761 - 766
  • [50] Self-labeling and Mental Health Service Use
    Thoits, Peggy A.
    SOCIETY AND MENTAL HEALTH, 2024,