Improving the performance of the speaker emotion recognition based on low dimension prosody features vector

被引：6

作者：

Gudmalwar, Ashishkumar Prabhakar ^{[1
]}

Rao, Ch V. Rama ^{[1
]}

Dutta, Anirban ^{[1
]}

机构：

[1] Natl Inst Technol, Shillong, Meghalaya, India

来源：

INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY | 2019年 / 22卷 / 03期

关键词：

Prosody; PCA; Emotion recognition; Recognition rate; SPEECH;

D O I：

10.1007/s10772-018-09576-4

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Speaker emotion recognition is an important research issue as it finds lots of applications in human-robot interaction, computer-human interaction, etc. This work deals with the recognition of emotion of the speaker from speech utterance. For that features like pitch, log energy, zero crossing rate, and first three formant frequencies are used. Feature vectors are constructed using the 11 statistical parameters of each feature. The Artificial Neural Network (ANN) is chosen as a classifier owing to its universal function approximation capabilities. In ANN based classifier, the time required for training the network as well as for classification depends upon the dimension of feature vector. This work focused on development of a speaker emotion recognition system using prosody features as well as reduction of dimensionality of feature vectors. Here, principle component analysis (PCA) is used for feature vector dimensionality reduction. Emotional prosody speech and transcription from Linguistic Data Consortium (LDC) and Berlin emotional databases are considered for evaluating the performance of proposed approach for seven types of emotion recognition. The performance of the proposed method is compared with existing approaches and better performance is obtained with proposed method. From experimental results it is observed that 75.32% and 84.5% recognition rate is obtained for Berlin emotional database and LDC emotional speech database respectively.

引用

页码：521 / 531

页数：11

共 50 条

[41] Speaker independent phoneme recognition based on fractal dimension (DF) and the mel-frequency cepstral coefficients features
Fekkai, S
Al-Akaidi, M
Blackledge, JM
2001 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I-VI, PROCEEDINGS: VOL I: SPEECH PROCESSING 1; VOL II: SPEECH PROCESSING 2 IND TECHNOL TRACK DESIGN & IMPLEMENTATION OF SIGNAL PROCESSING SYSTEMS NEURALNETWORKS FOR SIGNAL PROCESSING; VOL III: IMAGE & MULTIDIMENSIONAL SIGNAL PROCESSING MULTIMEDIA SIGNAL PROCESSING - VOL IV: SIGNAL PROCESSING FOR COMMUNICATIONS; VOL V: SIGNAL PROCESSING EDUCATION SENSOR ARRAY & MULTICHANNEL SIGNAL PROCESSING AUDIO & ELECTROACOUSTICS; VOL VI: SIGNAL PROCESSING THEORY & METHODS STUDENT FORUM, 2001, : 4014 - 4014
[42] SALIENCE BASED LEXICAL FEATURES FOR EMOTION RECOGNITION
Gamage, Kalani Wataraka
Sethu, Vidhyasaharan
Ambikairajah, Eliathamby
2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 5830 - 5834
[43] Speaker-independent speech emotion recognition by fusion of functional and accompanying paralanguage features
Qi-rong Mao
Xiao-lei Zhao
Zheng-wei Huang
Yong-zhao Zhan
Journal of Zhejiang University SCIENCE C, 2013, 14 : 573 - 582
[44] Speech Emotion Recognition Based on Arabic Features
Meddeb, Mohamed
Karray, Hichem
Alimi, Adel M.
2015 15TH INTERNATIONAL CONFERENCE ON INTELLIGENT SYSTEMS DESIGN AND APPLICATIONS (ISDA), 2015, : 46 - 51
[45] On the relevance of high-level features for speaker independent emotion recognition of spontaneous speech
Lugger, Marko
Yang, Bin
INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 1959 - 1962
[46] A Study on the Search of the Most Discriminative Speech Features in the Speaker Dependent Speech Emotion Recognition
Pao, Tsang-Long
Wang, Chun-Hsiang
Li, Yu-Ji
2012 FIFTH INTERNATIONAL SYMPOSIUM ON PARALLEL ARCHITECTURES, ALGORITHMS AND PROGRAMMING (PAAP), 2012, : 157 - 162
[47] Speaker-independent speech emotion recognition by fusion of functional and accompanying paralanguage features
Qi-rong MAO
Xiao-lei ZHAO
Zheng-wei HUANG
Yong-zhao ZHAN
Frontiers of Information Technology & Electronic Engineering, 2013, 14 (07) : 573 - 582
[48] Speaker-independent speech emotion recognition by fusion of functional and accompanying paralanguage features
Mao, Qi-rong
Zhao, Xiao-lei
Huang, Zheng-wei
Zhan, Yong-zhao
JOURNAL OF ZHEJIANG UNIVERSITY-SCIENCE C-COMPUTERS & ELECTRONICS, 2013, 14 (07): : 573 - 582
[49] i-vector Based Speaker Recognition on Short Utterances
Kanagasundaram, Ahilan
Vogt, Robbie
Dean, David
Sridharan, Sridha
Mason, Michael
12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 2352 - +
[50] PRIVACY PRESERVING VECTOR QUANTIZATION BASED SPEAKER RECOGNITION SYSTEM
Ene, Andrei
Togan, Mihai
Toma, Stefan-Adrian
PROCEEDINGS OF THE ROMANIAN ACADEMY SERIES A-MATHEMATICS PHYSICS TECHNICAL SCIENCES INFORMATION SCIENCE, 2017, 18 : 371 - 380

← 1 2 3 4 5 →