Connecting Subspace Learning and Extreme Learning Machine in Speech Emotion Recognition

被引:35
|
作者
Xu, Xinzhou [1 ,2 ,3 ]
Deng, Jun [4 ]
Coutinho, Eduardo [5 ,6 ]
Wu, Chen [1 ]
Zhao, Li [2 ]
Schuller, Bjoern W. [7 ,8 ]
机构
[1] Nanjing Univ Posts & Telecommun, Coll Internet Things, Nanjing 210003, Jiangsu, Peoples R China
[2] Southeast Univ, Minist Educ, Key Lab Underwater Acoust Signal Proc, Nanjing 210096, Jiangsu, Peoples R China
[3] Tech Univ Munich, MMK, Machine Intelligence & Signal Proc Grp, D-80290 Munich, Germany
[4] AudEERING GmbH, D-82205 Gilching, Germany
[5] Imperial Coll London, Dept Comp, London SW7 2AZ, England
[6] Univ Liverpool, Dept Mus, Liverpool L69 3BX, Merseyside, England
[7] Imperial Coll London, Grp Language Audio & Mus, London SW7 2AZ, England
[8] Univ Augsburg, Chair Embedded Intelligence Hlth Care & Wellbeing, D-86159 Augsburg, Germany
基金
欧盟地平线“2020”;
关键词
Speech emotion recognition; extreme learning machine; subspace learning; graph embedding; spectral regression; DISCRIMINANT-ANALYSIS; SPECTRAL REGRESSION; GENERAL FRAMEWORK; APPROXIMATION; ALGORITHMS; SELECTION;
D O I
10.1109/TMM.2018.2865834
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Speech emotion recognition (SER) is a powerful tool for endowing computers with the capacity to process information about the affective states of users in human-machine interactions. Recent research has shown the effectiveness of graph embedding-based subspace learning and extreme learning machine applied to SER, but there are still various drawbacks in these two techniques that limit their application. Regarding subspace learning, the change from linearity to nonlinearity is usually achieved through kernelization, whereas extreme learning machines only take label information into consideration at the output layer. In order to overcome these drawbacks, this paper leverages extreme learning machines for dimensionality reduction and proposes a novel framework to combine spectral regression-based subspace learning and extreme learning machines. The proposed framework contains three stages-data mapping, graph decomposition, and regression. At the data mapping stage, various mapping strategies provide different views of the samples. At the graph decomposition stage, specifically designed embedding graphs provide a possibility to better represent the structure of data through generating virtual coordinates. Finally, at the regression stage, dimension-reduced mappings are achieved by connecting the virtual coordinates and data mapping. Using this framework, we propose several novel dimensionality reduction algorithms, apply them to SER tasks, and compare their performance to relevant state-of-the-art methods. Our results on several paralinguistic corpora show that our proposed techniques lead to significant improvements.
引用
收藏
页码:795 / 808
页数:14
相关论文
共 50 条
  • [1] SER: Speech Emotion Recognition Application Based on Extreme Learning Machine
    Ainurrochman
    Febriansyah, Irfanur Ilham
    Yuhana, Umi Laili
    [J]. PROCEEDINGS OF 2021 13TH INTERNATIONAL CONFERENCE ON INFORMATION & COMMUNICATION TECHNOLOGY AND SYSTEM (ICTS), 2021, : 179 - 183
  • [2] A FEATURE FUSION METHOD BASED ON EXTREME LEARNING MACHINE FOR SPEECH EMOTION RECOGNITION
    Guo, Lili
    Wang, Longbiao
    Dang, Jianwu
    Zhang, Linjuan
    Guan, Haotian
    [J]. 2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 2666 - 2670
  • [3] Speech Emotion Recognition Using Deep Neural Network and Extreme Learning Machine
    Han, Kun
    Yu, Dong
    Tashev, Ivan
    [J]. 15TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2014), VOLS 1-4, 2014, : 223 - 227
  • [4] Machine Learning Approach for Emotion Recognition in Speech
    Gjoreski, Martin
    Gjoreski, Hristijan
    Kulakov, Andrea
    [J]. INFORMATICA-JOURNAL OF COMPUTING AND INFORMATICS, 2014, 38 (04): : 377 - 383
  • [5] Speech emotion recognition based on feature selection and extreme learning machine decision tree
    Liu, Zhen-Tao
    Wu, Min
    Cao, Wei-Hua
    Mao, Jun-Wei
    Xu, Jian-Ping
    Tan, Guan-Zheng
    [J]. NEUROCOMPUTING, 2018, 273 : 271 - 280
  • [6] Speech emotion recognition using optimized genetic algorithm-extreme learning machine
    Musatafa Abbas Abbood Albadr
    Sabrina Tiun
    Masri Ayob
    Fahad Taha AL-Dhief
    Khairuddin Omar
    Mhd Khaled Maen
    [J]. Multimedia Tools and Applications, 2022, 81 : 23963 - 23989
  • [7] Exploration of Complementary Features for Speech Emotion Recognition Based on Kernel Extreme Learning Machine
    Guo, Lili
    Wang, Longbiao
    Dang, Jianwu
    Liu, Zhilei
    Guan, Haotian
    [J]. IEEE ACCESS, 2019, 7 : 75798 - 75809
  • [8] Speech emotion recognition using optimized genetic algorithm-extreme learning machine
    Albadr, Musatafa Abbas Abbood
    Tiun, Sabrina
    Ayob, Masri
    AL-Dhief, Fahad Taha
    Omar, Khairuddin
    Maen, Mhd Khaled
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2022, 81 (17) : 23963 - 23989
  • [9] Emotion Recognition On Speech Signals Using Machine Learning
    Ghai, Mohan
    Lal, Shamit
    Duggal, Shivam
    Manik, Shrey
    [J]. PROCEEDINGS OF THE 2017 INTERNATIONAL CONFERENCE ON BIG DATA ANALYTICS AND COMPUTATIONAL INTELLIGENCE (ICBDAC), 2017, : 34 - 39
  • [10] Speech based Emotion Recognition using Machine Learning
    Deshmukh, Girija
    Gaonkar, Apurva
    Golwalkar, Gauri
    Kulkarni, Sukanya
    [J]. PROCEEDINGS OF THE 2019 3RD INTERNATIONAL CONFERENCE ON COMPUTING METHODOLOGIES AND COMMUNICATION (ICCMC 2019), 2019, : 812 - 817