Robotic Emotion Recognition Using Two-Level Features Fusion in Audio Signals of Speech

被引:1
|
作者
Li, Chang [1 ]
机构
[1] Univ Elect Sci & Technol China, Sch Commun & Informat Engn, Chengdu 611731, Peoples R China
关键词
Data enhancement; feature fusion; VGGish; MFCC; BiLSTM; speech signal;
D O I
10.1109/JSEN.2021.3065012
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Speech emotion recognition (SER) is a challenging task, since the definition of emotions in sentences is ambiguous. Previous research work mainly focuses on extracting hand-craft features from audio signals to feed into shallow models. Recently, Visual Geometry Group like(VGGish) has replaced traditional feature extractors, due to its effects. VGGish feature vectors were viewed as Deep Neural Network (DNN) selected from a number of features. Although the existing studies on SER have achieved promising results, they only use single-level features. This paper proposes an emotion recognition system, based on speech signals, using two-level features with position information, Later Feature Fusion with VGGish Overlap(LFFVO), to tackle the present limitations. First, the position information, from two-level features, is extracted by Bi-direction Long Short Time Memory (BiLSTM) neural network, followed by features fusion, to predict the emotion. The proposed method improved accuracy from 48.2% (baseline) to 69.5%, when trained, validated and evaluated using an Interactive emotional dyadic motion capture database (IEMOCAP).
引用
下载
收藏
页码:17447 / 17454
页数:8
相关论文
共 50 条
  • [21] Emotion Recognition On Speech Signals Using Machine Learning
    Ghai, Mohan
    Lal, Shamit
    Duggal, Shivam
    Manik, Shrey
    PROCEEDINGS OF THE 2017 INTERNATIONAL CONFERENCE ON BIG DATA ANALYTICS AND COMPUTATIONAL INTELLIGENCE (ICBDAC), 2017, : 34 - 39
  • [22] AUDIO SEGMENTATION FOR SPEECH RECOGNITION USING SEGMENT FEATURES
    Rybach, David
    Gollan, Christian
    Schlueter, Ralf
    Ney, Hermann
    2009 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1- 8, PROCEEDINGS, 2009, : 4197 - 4200
  • [23] Speech emotion recognition based on prosodic segment level features
    Han, Wenjing
    Li, Haifeng
    Qinghua Daxue Xuebao/Journal of Tsinghua University, 2009, 49 (SUPPL. 1): : 1363 - 1368
  • [24] Improved speech emotion recognition based on music-related audio features
    Vu, Linh
    Phan, Raphael C-W
    Han, Lim Wern
    Phung, Dinh
    2022 30TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO 2022), 2022, : 120 - 124
  • [25] Speech Emotion Recognition using SVM with thresholding fusion
    Gupta, Shilpi
    Mehra, Anu
    Vinay
    2ND INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING AND INTEGRATED NETWORKS (SPIN) 2015, 2015, : 570 - 574
  • [26] Fusion of Global Statistical and Segmental Spectral Features for Speech Emotion Recognition
    Hu, Hao
    Xu, Ming-Xing
    Wu, Wei
    INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4, 2007, : 1013 - 1016
  • [27] PERFORMANCE ANALYSIS OF SPECTRAL AND PROSODIC FEATURES AND THEIR FUSION FOR EMOTION RECOGNITION IN SPEECH
    Gaurav, Manish
    2008 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY: SLT 2008, PROCEEDINGS, 2008, : 313 - 316
  • [28] Continuous Music Emotion Recognition Using Selected Audio Features
    Chmulik, Michal
    Jarina, Roman
    Kuba, Michal
    Lieskovska, Eva
    2019 42ND INTERNATIONAL CONFERENCE ON TELECOMMUNICATIONS AND SIGNAL PROCESSING (TSP), 2019, : 589 - 592
  • [29] MULTI-TIME-SCALE CONVOLUTION FOR EMOTION RECOGNITION FROM SPEECH AUDIO SIGNALS
    Guizzo, Eric
    Weyde, Tillman
    Leveson, Jack Barnett
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 6489 - 6493
  • [30] Emotion Recognition in Speech Using MFCC and Wavelet Features
    Kishore, K. V. Krishna
    Satish, P. Krishna
    PROCEEDINGS OF THE 2013 3RD IEEE INTERNATIONAL ADVANCE COMPUTING CONFERENCE (IACC), 2013, : 842 - 847