Robotic Emotion Recognition Using Two-Level Features Fusion in Audio Signals of Speech

被引:1
|
作者
Li, Chang [1 ]
机构
[1] Univ Elect Sci & Technol China, Sch Commun & Informat Engn, Chengdu 611731, Peoples R China
关键词
Data enhancement; feature fusion; VGGish; MFCC; BiLSTM; speech signal;
D O I
10.1109/JSEN.2021.3065012
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Speech emotion recognition (SER) is a challenging task, since the definition of emotions in sentences is ambiguous. Previous research work mainly focuses on extracting hand-craft features from audio signals to feed into shallow models. Recently, Visual Geometry Group like(VGGish) has replaced traditional feature extractors, due to its effects. VGGish feature vectors were viewed as Deep Neural Network (DNN) selected from a number of features. Although the existing studies on SER have achieved promising results, they only use single-level features. This paper proposes an emotion recognition system, based on speech signals, using two-level features with position information, Later Feature Fusion with VGGish Overlap(LFFVO), to tackle the present limitations. First, the position information, from two-level features, is extracted by Bi-direction Long Short Time Memory (BiLSTM) neural network, followed by features fusion, to predict the emotion. The proposed method improved accuracy from 48.2% (baseline) to 69.5%, when trained, validated and evaluated using an Interactive emotional dyadic motion capture database (IEMOCAP).
引用
下载
收藏
页码:17447 / 17454
页数:8
相关论文
共 50 条
  • [31] Speech emotion recognition using nonlinear dynamics features
    Shahzadi, Ali
    Ahmadyfard, Alireza
    Harimi, Ali
    Yaghmaie, Khashayar
    TURKISH JOURNAL OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCES, 2015, 23 : 2056 - 2073
  • [32] Speech Emotion Recognition Using Minimum Extracted Features
    Abdulsalam, Wisal Hashim
    Alhamdani, Rafah Shihab
    Abdullah, Mohammed Najm
    2018 1ST ANNUAL INTERNATIONAL CONFERENCE ON INFORMATION AND SCIENCES (AICIS 2018), 2018, : 58 - 61
  • [33] Speech Emotion Recognition Using Magnitude and Phase Features
    Shankar D.R.
    Manjula R.B.
    Biradar R.C.
    SN Computer Science, 5 (5)
  • [34] RECOGNITION OF EMOTION IN SPEECH USING VARIOGRAM BASED FEATURES
    Esmaileyan, Zeynab
    Marvi, Hosein
    MALAYSIAN JOURNAL OF COMPUTER SCIENCE, 2014, 27 (03) : 156 - 170
  • [35] Speech Emotion Recognition Using ANN on MFCC Features
    Dolka, Harshit
    Xavier, Arul V. M.
    Juliet, Sujitha
    ICSPC'21: 2021 3RD INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING AND COMMUNICATION (ICPSC), 2021, : 431 - 435
  • [36] Speech Emotion Recognition Using Local and Global Features
    Gao, Yuanbo
    Li, Baobin
    Wang, Ning
    Zhu, Tingshao
    BRAIN INFORMATICS, BI 2017, 2017, 10654 : 3 - 13
  • [37] Emotion recognition using novel speech signal features
    Tabatabaei, Talieh Seyed
    Krishnan, Sridhar
    Guergachi, Aziz
    2007 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS, VOLS 1-11, 2007, : 345 - +
  • [38] Speech Emotion Recognition by Late Fusion of Linguistic and Acoustic Features using Deep Learning Models
    Sato, Kiyohide
    Kishi, Keita
    Kosaka, Tetsuo
    2023 ASIA PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE, APSIPA ASC, 2023, : 1013 - 1018
  • [39] FUSION APPROACHES FOR EMOTION RECOGNITION FROM SPEECH USING ACOUSTIC AND TEXT-BASED FEATURES
    Pepino, Leonardo
    Riera, Pablo
    Ferrer, Luciana
    Gravano, Agustin
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 6484 - 6488
  • [40] Excitation Features of Speech for Emotion Recognition Using Neutral Speech as Reference
    Kadin, Sudarsana Reddy
    Gangamohan, P.
    Gangashetty, Suryakanth, V
    Alku, Paavo
    Yegnanarayana, B.
    CIRCUITS SYSTEMS AND SIGNAL PROCESSING, 2020, 39 (09) : 4459 - 4481