Improving the performance of the speaker emotion recognition based on low dimension prosody features vector

被引:6
|
作者
Gudmalwar, Ashishkumar Prabhakar [1 ]
Rao, Ch V. Rama [1 ]
Dutta, Anirban [1 ]
机构
[1] Natl Inst Technol, Shillong, Meghalaya, India
关键词
Prosody; PCA; Emotion recognition; Recognition rate; SPEECH;
D O I
10.1007/s10772-018-09576-4
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Speaker emotion recognition is an important research issue as it finds lots of applications in human-robot interaction, computer-human interaction, etc. This work deals with the recognition of emotion of the speaker from speech utterance. For that features like pitch, log energy, zero crossing rate, and first three formant frequencies are used. Feature vectors are constructed using the 11 statistical parameters of each feature. The Artificial Neural Network (ANN) is chosen as a classifier owing to its universal function approximation capabilities. In ANN based classifier, the time required for training the network as well as for classification depends upon the dimension of feature vector. This work focused on development of a speaker emotion recognition system using prosody features as well as reduction of dimensionality of feature vectors. Here, principle component analysis (PCA) is used for feature vector dimensionality reduction. Emotional prosody speech and transcription from Linguistic Data Consortium (LDC) and Berlin emotional databases are considered for evaluating the performance of proposed approach for seven types of emotion recognition. The performance of the proposed method is compared with existing approaches and better performance is obtained with proposed method. From experimental results it is observed that 75.32% and 84.5% recognition rate is obtained for Berlin emotional database and LDC emotional speech database respectively.
引用
收藏
页码:521 / 531
页数:11
相关论文
共 50 条
  • [31] New Adaptive Feature Vector Construction Procedure for Speaker Emotion Recognition Based on Wavelet Transform and Genetic Algorithm
    Soroka, Alexander M.
    Kovalets, Pavel E.
    Kheidorov, Igor E.
    ADVANCES IN NEURAL NETWORKS - ISNN 2016, 2016, 9719 : 613 - 619
  • [32] Graph Learning Based Speaker Independent Speech Emotion Recognition
    Xu, Xinzhou
    Huang, Chengwei
    Wu, Chen
    Wang, Qingyun
    Zhao, Li
    ADVANCES IN ELECTRICAL AND COMPUTER ENGINEERING, 2014, 14 (02) : 17 - 22
  • [33] DNN-HMM-Based Speaker-Adaptive Emotion Recognition Using MFCC and Epoch-Based Features
    Fahad, Md. Shah
    Deepak, Akshay
    Pradhan, Gayadhar
    Yadav, Jainath
    CIRCUITS SYSTEMS AND SIGNAL PROCESSING, 2021, 40 (01) : 466 - 489
  • [34] DNN-HMM-Based Speaker-Adaptive Emotion Recognition Using MFCC and Epoch-Based Features
    Md. Shah Fahad
    Akshay Deepak
    Gayadhar Pradhan
    Jainath Yadav
    Circuits, Systems, and Signal Processing, 2021, 40 : 466 - 489
  • [35] Fractal dimension pattern-based multiresolution analysis for rough estimator of speaker-dependent audio emotion recognition
    Cheng, Miao
    Tsoi, Ah Chung
    INTERNATIONAL JOURNAL OF WAVELETS MULTIRESOLUTION AND INFORMATION PROCESSING, 2017, 15 (05)
  • [36] Improving Emotion Recognition Performance by Random-Forest-Based Feature Selection
    Egorow, Olga
    Siegert, Ingo
    Wendemuth, Andreas
    SPEECH AND COMPUTER (SPECOM 2018), 2018, 11096 : 134 - 144
  • [37] EEG-Based Emotion Recognition Using Frequency Domain Features and Support Vector Machines
    Wang, Xiao-Wei
    Nie, Dan
    Lu, Bao-Liang
    NEURAL INFORMATION PROCESSING, PT I, 2011, 7062 : 734 - +
  • [38] Improving Speech Emotion Recognition via Fine-tuning ASR with Speaker Information
    Ta, Bao Thang
    Nguyen, Tung Lam
    Dang, Dinh Son
    Le, Nhat Minh
    Do, Van Hai
    PROCEEDINGS OF 2022 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2022, : 1596 - 1601
  • [39] Robust speaker recognition based on biologically inspired features
    Zouhir, Youssef
    Ben Fredj, Ines
    Ouni, Kais
    Zarka, Mohamed
    INTERNATIONAL JOURNAL OF SIGNAL AND IMAGING SYSTEMS ENGINEERING, 2020, 12 (1-2) : 19 - 27
  • [40] Filter bank Based Cepstral Features for Speaker Recognition
    Chougule, Sharada V.
    Chavan, Mahesh S.
    Gaikwad, M. S.
    2014 IEEE GLOBAL CONFERENCE ON WIRELESS COMPUTING AND NETWORKING (GCWCN), 2014, : 102 - 106