Improving the performance of the speaker emotion recognition based on low dimension prosody features vector

被引:6
|
作者
Gudmalwar, Ashishkumar Prabhakar [1 ]
Rao, Ch V. Rama [1 ]
Dutta, Anirban [1 ]
机构
[1] Natl Inst Technol, Shillong, Meghalaya, India
关键词
Prosody; PCA; Emotion recognition; Recognition rate; SPEECH;
D O I
10.1007/s10772-018-09576-4
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Speaker emotion recognition is an important research issue as it finds lots of applications in human-robot interaction, computer-human interaction, etc. This work deals with the recognition of emotion of the speaker from speech utterance. For that features like pitch, log energy, zero crossing rate, and first three formant frequencies are used. Feature vectors are constructed using the 11 statistical parameters of each feature. The Artificial Neural Network (ANN) is chosen as a classifier owing to its universal function approximation capabilities. In ANN based classifier, the time required for training the network as well as for classification depends upon the dimension of feature vector. This work focused on development of a speaker emotion recognition system using prosody features as well as reduction of dimensionality of feature vectors. Here, principle component analysis (PCA) is used for feature vector dimensionality reduction. Emotional prosody speech and transcription from Linguistic Data Consortium (LDC) and Berlin emotional databases are considered for evaluating the performance of proposed approach for seven types of emotion recognition. The performance of the proposed method is compared with existing approaches and better performance is obtained with proposed method. From experimental results it is observed that 75.32% and 84.5% recognition rate is obtained for Berlin emotional database and LDC emotional speech database respectively.
引用
收藏
页码:521 / 531
页数:11
相关论文
共 50 条
  • [41] Speaker independent phoneme recognition based on fractal dimension (DF) and the mel-frequency cepstral coefficients features
    Fekkai, S
    Al-Akaidi, M
    Blackledge, JM
    2001 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I-VI, PROCEEDINGS: VOL I: SPEECH PROCESSING 1; VOL II: SPEECH PROCESSING 2 IND TECHNOL TRACK DESIGN & IMPLEMENTATION OF SIGNAL PROCESSING SYSTEMS NEURALNETWORKS FOR SIGNAL PROCESSING; VOL III: IMAGE & MULTIDIMENSIONAL SIGNAL PROCESSING MULTIMEDIA SIGNAL PROCESSING - VOL IV: SIGNAL PROCESSING FOR COMMUNICATIONS; VOL V: SIGNAL PROCESSING EDUCATION SENSOR ARRAY & MULTICHANNEL SIGNAL PROCESSING AUDIO & ELECTROACOUSTICS; VOL VI: SIGNAL PROCESSING THEORY & METHODS STUDENT FORUM, 2001, : 4014 - 4014
  • [42] SALIENCE BASED LEXICAL FEATURES FOR EMOTION RECOGNITION
    Gamage, Kalani Wataraka
    Sethu, Vidhyasaharan
    Ambikairajah, Eliathamby
    2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 5830 - 5834
  • [43] Speaker-independent speech emotion recognition by fusion of functional and accompanying paralanguage features
    Qi-rong Mao
    Xiao-lei Zhao
    Zheng-wei Huang
    Yong-zhao Zhan
    Journal of Zhejiang University SCIENCE C, 2013, 14 : 573 - 582
  • [44] Speech Emotion Recognition Based on Arabic Features
    Meddeb, Mohamed
    Karray, Hichem
    Alimi, Adel M.
    2015 15TH INTERNATIONAL CONFERENCE ON INTELLIGENT SYSTEMS DESIGN AND APPLICATIONS (ISDA), 2015, : 46 - 51
  • [45] On the relevance of high-level features for speaker independent emotion recognition of spontaneous speech
    Lugger, Marko
    Yang, Bin
    INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 1959 - 1962
  • [46] A Study on the Search of the Most Discriminative Speech Features in the Speaker Dependent Speech Emotion Recognition
    Pao, Tsang-Long
    Wang, Chun-Hsiang
    Li, Yu-Ji
    2012 FIFTH INTERNATIONAL SYMPOSIUM ON PARALLEL ARCHITECTURES, ALGORITHMS AND PROGRAMMING (PAAP), 2012, : 157 - 162
  • [47] Speaker-independent speech emotion recognition by fusion of functional and accompanying paralanguage features
    Qi-rong MAO
    Xiao-lei ZHAO
    Zheng-wei HUANG
    Yong-zhao ZHAN
    Frontiers of Information Technology & Electronic Engineering, 2013, 14 (07) : 573 - 582
  • [48] Speaker-independent speech emotion recognition by fusion of functional and accompanying paralanguage features
    Mao, Qi-rong
    Zhao, Xiao-lei
    Huang, Zheng-wei
    Zhan, Yong-zhao
    JOURNAL OF ZHEJIANG UNIVERSITY-SCIENCE C-COMPUTERS & ELECTRONICS, 2013, 14 (07): : 573 - 582
  • [49] i-vector Based Speaker Recognition on Short Utterances
    Kanagasundaram, Ahilan
    Vogt, Robbie
    Dean, David
    Sridharan, Sridha
    Mason, Michael
    12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 2352 - +
  • [50] PRIVACY PRESERVING VECTOR QUANTIZATION BASED SPEAKER RECOGNITION SYSTEM
    Ene, Andrei
    Togan, Mihai
    Toma, Stefan-Adrian
    PROCEEDINGS OF THE ROMANIAN ACADEMY SERIES A-MATHEMATICS PHYSICS TECHNICAL SCIENCES INFORMATION SCIENCE, 2017, 18 : 371 - 380