Speech Emotion Recognition using Convolutional Neural Network with Audio Word-based Embedding

被引：0

作者：

Huang, Kun-Yi ^{[1
]}

Wu, Chung-Hsien ^{[1
]}

Hong, Qian-Bei ^{[2
,3
]}

Su, Ming-Hsiang ^{[1
]}

Zeng, Yuan-Rong ^{[1
]}

机构：

[1] Natl Cheng Kung Univ, Dept Comp Sci & Informat Engn, Tainan, Taiwan

[2] Natl Cheng Kung Univ, PhD Program Multimedia Syst & Intelligent Comp, Tainan, Taiwan

[3] Acad Sinica, Taipei, Taiwan

来源：

2018 11TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP) | 2018年

关键词：

speech emotion recognition; convolutional neural network; audio word;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

A complete emotional expression typically contains a complex temporal course in a natural conversation. Related research on utterance-level, segment-level and multi-level processing lacks understanding of the underlying relation of emotional speech. In this work, a convolutional neural network (CNN) with audio word-based embedding is proposed for emotion modeling. In this study, vector quantization is first applied to convert the low level features of each speech frame into audio words using k-means algorithm. Word2vec is adopted to convert an input speech utterance into the corresponding audio word vector sequence. Finally, the audio word vector sequences of the training emotional speech data with emotion annotation are used to construct the CNN-based emotion model. The NCKU-ES database, containing seven emotion categories: happiness, boredom, anger, anxiety, sadness, surprise and disgust, was collected and five-fold cross validation was used to evaluate the performance of the proposed CNN-based method for speech emotion recognition. Experimental results show that the proposed method achieved an emotion recognition accuracy of 82.34%, improving by 8.7% compared to the Long Short Term Memory (LSTM)-based method, which faced the challenging issue of long input sequence. Comparing with raw features, the audio word-based embedding achieved an improvement of 3.4% for speech emotion recognition.

引用

页码：265 / 269

页数：5

共 50 条

[1] Speech Emotion Recognition based on Interactive Convolutional Neural Network
Cheng, Huihui
Tang, Xiaoyu
[J]. 2020 IEEE 3RD INTERNATIONAL CONFERENCE ON INFORMATION COMMUNICATION AND SIGNAL PROCESSING (ICICSP 2020), 2020, : 163 - 167
[2] Speech Emotion Recognition Using Speech Feature and Word Embedding
Atmaja, Bagus Tris
Shirai, Kiyoaki
Akagi, Masato
[J]. 2019 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2019, : 519 - 523
[3] Speech-Act Classification Using Convolutional Neural Network and Word Embedding
Bae, Kyoungman
Ko, Youngjoong
[J]. INTERNATIONAL JOURNAL ON ARTIFICIAL INTELLIGENCE TOOLS, 2018, 27 (06)
[4] Speech Emotion Recognition in Neurological Disorders Using Convolutional Neural Network
Zisad, Sharif Noor
Hossain, Mohammad Shahadat
Andersson, Karl
[J]. BRAIN INFORMATICS, BI 2020, 2020, 12241 : 287 - 296
[5] Constructing Speech Emotion Recognition Model Based on Convolutional Neural Network
Kuo, Jong-Yih
Chen, Zhao-Ming
Lin, Hui-Chi
[J]. 2021 28TH ASIA-PACIFIC SOFTWARE ENGINEERING CONFERENCE WORKSHOPS (APSECW 2021), 2021, : 52 - 56
[6] Design of a Convolutional Neural Network for Speech Emotion Recognition
Lee, Kyong Hee
Kim, Do Hyun
[J]. 11TH INTERNATIONAL CONFERENCE ON ICT CONVERGENCE: DATA, NETWORK, AND AI IN THE AGE OF UNTACT (ICTC 2020), 2020, : 1332 - 1335
[7] CONVOLUTIONAL NEURAL NETWORK TECHNIQUES FOR SPEECH EMOTION RECOGNITION
Parthasarathy, Srinivas
Tashev, Ivan
[J]. 2018 16TH INTERNATIONAL WORKSHOP ON ACOUSTIC SIGNAL ENHANCEMENT (IWAENC), 2018, : 121 - 125
[8] Speech Emotion Recognition Using Generative Adversarial Network and Deep Convolutional Neural Network
Bhangale, Kishor
Kothandaraman, Mohanaprasad
[J]. CIRCUITS SYSTEMS AND SIGNAL PROCESSING, 2024, 43 (04) : 2341 - 2384
[9] Speech Emotion Recognition Using Generative Adversarial Network and Deep Convolutional Neural Network
Kishor Bhangale
Mohanaprasad Kothandaraman
[J]. Circuits, Systems, and Signal Processing, 2024, 43 : 2341 - 2384
[10] Speech Emotion Recognition Based on Multi-Task Learning Using a Convolutional Neural Network
Kim, Nam Kyun
Lee, Jiwon
Ha, Hun Kyu
Lee, Geon Woo
Lee, Jung Hyuk
Kim, Hong Kook
[J]. 2017 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC 2017), 2017, : 704 - 707

← 1 2 3 4 5 →