Robotic Emotion Recognition Using Two-Level Features Fusion in Audio Signals of Speech

被引:1
|
作者
Li, Chang [1 ]
机构
[1] Univ Elect Sci & Technol China, Sch Commun & Informat Engn, Chengdu 611731, Peoples R China
关键词
Data enhancement; feature fusion; VGGish; MFCC; BiLSTM; speech signal;
D O I
10.1109/JSEN.2021.3065012
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Speech emotion recognition (SER) is a challenging task, since the definition of emotions in sentences is ambiguous. Previous research work mainly focuses on extracting hand-craft features from audio signals to feed into shallow models. Recently, Visual Geometry Group like(VGGish) has replaced traditional feature extractors, due to its effects. VGGish feature vectors were viewed as Deep Neural Network (DNN) selected from a number of features. Although the existing studies on SER have achieved promising results, they only use single-level features. This paper proposes an emotion recognition system, based on speech signals, using two-level features with position information, Later Feature Fusion with VGGish Overlap(LFFVO), to tackle the present limitations. First, the position information, from two-level features, is extracted by Bi-direction Long Short Time Memory (BiLSTM) neural network, followed by features fusion, to predict the emotion. The proposed method improved accuracy from 48.2% (baseline) to 69.5%, when trained, validated and evaluated using an Interactive emotional dyadic motion capture database (IEMOCAP).
引用
收藏
页码:17447 / 17454
页数:8
相关论文
共 50 条
  • [1] Emotion Recognition Using Fusion of Audio and Video Features
    Ortega, Juan D. S.
    Cardinal, Patrick
    Koerich, Alessandro L.
    [J]. 2019 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN AND CYBERNETICS (SMC), 2019, : 3847 - 3852
  • [2] Two-Level Bimodal Association for Audio-Visual Speech Recognition
    Lee, Jong-Seok
    Ebrahimi, Touradj
    [J]. ADVANCED CONCEPTS FOR INTELLIGENT VISION SYSTEMS, PROCEEDINGS, 2009, 5807 : 133 - 144
  • [3] Emotion recognition from speech signals using digital features optimization by diversity measure fusion
    Konduru, Ashok Kumar
    Iqbal, J. L. Mazher
    [J]. JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2024, 46 (01) : 2547 - 2572
  • [4] Two-level discriminative speech emotion recognition model with wave field dynamics: A personalized speech emotion recognition method
    Jia, Ning
    Zheng, Chunjun
    [J]. COMPUTER COMMUNICATIONS, 2021, 180 : 161 - 170
  • [5] Emotion recognition from speech signals using new harmony features
    Yang, B.
    Lugger, M.
    [J]. SIGNAL PROCESSING, 2010, 90 (05) : 1415 - 1423
  • [6] Speech Emotion Recognition Using Audio Matching
    Chaturvedi, Iti
    Noel, Tim
    Satapathy, Ranjan
    [J]. ELECTRONICS, 2022, 11 (23)
  • [7] Multimodal emotion recognition for the fusion of speech and EEG signals
    Ma J.
    Sun Y.
    Zhang X.
    [J]. Xi'an Dianzi Keji Daxue Xuebao/Journal of Xidian University, 2019, 46 (01): : 143 - 150
  • [8] Emotion Recognition using Facial and Audio features
    Krishna, Tarun
    Rai, Ayush
    Bansal, Shubham
    Khandelwal, Shubham
    Gupta, Shubham
    Goyal, Dushyant
    [J]. ICMI'13: PROCEEDINGS OF THE 2013 ACM INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION, 2013, : 557 - 562
  • [9] TRNet: Two-level Refinement Network leveraging speech enhancement for noise robust speech emotion recognition
    Chen, Chengxin
    Zhang, Pengyuan
    [J]. APPLIED ACOUSTICS, 2024, 225
  • [10] Exploring Emotion Features and Fusion Strategies for Audio-Video Emotion Recognition
    Zhou, Hengshun
    Meng, Debin
    Zhang, Yuanyuan
    Peng, Xiaojiang
    Du, Jun
    Wang, Kai
    Qiao, Yu
    [J]. ICMI'19: PROCEEDINGS OF THE 2019 INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION, 2019, : 562 - 566