Robotic Emotion Recognition Using Two-Level Features Fusion in Audio Signals of Speech

被引：1

作者：

Li, Chang ^{[1
]}

机构：

[1] Univ Elect Sci & Technol China, Sch Commun & Informat Engn, Chengdu 611731, Peoples R China

来源：

IEEE SENSORS JOURNAL | 2022年 / 22卷 / 18期

关键词：

Data enhancement; feature fusion; VGGish; MFCC; BiLSTM; speech signal;

D O I：

10.1109/JSEN.2021.3065012

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Speech emotion recognition (SER) is a challenging task, since the definition of emotions in sentences is ambiguous. Previous research work mainly focuses on extracting hand-craft features from audio signals to feed into shallow models. Recently, Visual Geometry Group like(VGGish) has replaced traditional feature extractors, due to its effects. VGGish feature vectors were viewed as Deep Neural Network (DNN) selected from a number of features. Although the existing studies on SER have achieved promising results, they only use single-level features. This paper proposes an emotion recognition system, based on speech signals, using two-level features with position information, Later Feature Fusion with VGGish Overlap(LFFVO), to tackle the present limitations. First, the position information, from two-level features, is extracted by Bi-direction Long Short Time Memory (BiLSTM) neural network, followed by features fusion, to predict the emotion. The proposed method improved accuracy from 48.2% (baseline) to 69.5%, when trained, validated and evaluated using an Interactive emotional dyadic motion capture database (IEMOCAP).

引用

下载

页码：17447 / 17454

页数：8

共 50 条

[21] Emotion Recognition On Speech Signals Using Machine Learning
Ghai, Mohan
Lal, Shamit
Duggal, Shivam
Manik, Shrey
PROCEEDINGS OF THE 2017 INTERNATIONAL CONFERENCE ON BIG DATA ANALYTICS AND COMPUTATIONAL INTELLIGENCE (ICBDAC), 2017, : 34 - 39
[22] AUDIO SEGMENTATION FOR SPEECH RECOGNITION USING SEGMENT FEATURES
Rybach, David
Gollan, Christian
Schlueter, Ralf
Ney, Hermann
2009 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1- 8, PROCEEDINGS, 2009, : 4197 - 4200
[23] Speech emotion recognition based on prosodic segment level features
Han, Wenjing
Li, Haifeng
Qinghua Daxue Xuebao/Journal of Tsinghua University, 2009, 49 (SUPPL. 1): : 1363 - 1368
[24] Improved speech emotion recognition based on music-related audio features
Vu, Linh
Phan, Raphael C-W
Han, Lim Wern
Phung, Dinh
2022 30TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO 2022), 2022, : 120 - 124
[25] Speech Emotion Recognition using SVM with thresholding fusion
Gupta, Shilpi
Mehra, Anu
Vinay
2ND INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING AND INTEGRATED NETWORKS (SPIN) 2015, 2015, : 570 - 574
[26] Fusion of Global Statistical and Segmental Spectral Features for Speech Emotion Recognition
Hu, Hao
Xu, Ming-Xing
Wu, Wei
INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4, 2007, : 1013 - 1016
[27] PERFORMANCE ANALYSIS OF SPECTRAL AND PROSODIC FEATURES AND THEIR FUSION FOR EMOTION RECOGNITION IN SPEECH
Gaurav, Manish
2008 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY: SLT 2008, PROCEEDINGS, 2008, : 313 - 316
[28] Continuous Music Emotion Recognition Using Selected Audio Features
Chmulik, Michal
Jarina, Roman
Kuba, Michal
Lieskovska, Eva
2019 42ND INTERNATIONAL CONFERENCE ON TELECOMMUNICATIONS AND SIGNAL PROCESSING (TSP), 2019, : 589 - 592
[29] MULTI-TIME-SCALE CONVOLUTION FOR EMOTION RECOGNITION FROM SPEECH AUDIO SIGNALS
Guizzo, Eric
Weyde, Tillman
Leveson, Jack Barnett
2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 6489 - 6493
[30] Emotion Recognition in Speech Using MFCC and Wavelet Features
Kishore, K. V. Krishna
Satish, P. Krishna
PROCEEDINGS OF THE 2013 3RD IEEE INTERNATIONAL ADVANCE COMPUTING CONFERENCE (IACC), 2013, : 842 - 847

← 1 2 3 4 5 →