SPEAKER VARIABILITY IN EMOTION RECOGNITION - AN ADAPTATION BASED APPROACH

被引:0
|
作者
Ding, Ni [1 ]
Sethu, Vidhyasaharan [1 ]
Epps, Julien [1 ]
Ambikairajah, Eliathamby [1 ]
机构
[1] Univ New S Wales, Sch Elect Engn & Telecommun, Sydney, NSW 2052, Australia
关键词
Speaker adaptation; emotion classification; speaker normalisation; bootstrapping;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
None of the features commonly utilised in automatic emotion classification systems completely disassociate emotion-specific information from speaker-specific information. Consequently, this speaker-specific variability adversely affects the performance of the emotion classification system and in existing systems is frequently mitigated by some form of speaker normalisation. Speaker adaptation offers an alternative to normalisation and this paper proposes a novel bootstrapping technique which involves selecting appropriate initial models from a large training pool, prior to speaker adaptation of emotion models in the context of GMM based emotion classification as an alternative to speaker normalisation. Evaluations on the LDC Emotional Prosody and the FAU Aibo corpora reveal that an emotion classification system based on the proposed bootstrapping method outperforms systems based on speaker normalisation as long as a small amount of labelled adaptation data is available. It also outperforms speaker adaption from common initial models estimated from all training speakers.
引用
收藏
页码:5101 / 5104
页数:4
相关论文
共 50 条
  • [31] Emotion-state conversion for speaker recognition
    Li, DD
    Yang, YC
    Wu, ZH
    Wu, T
    AFFECTIVE COMPUTING AND INTELLIGENT INTERACTION, PROCEEDINGS, 2005, 3784 : 403 - 410
  • [32] DYNAMIC SPEAKER ADAPTATION IN SPEAKER-INDEPENDENT WORD RECOGNITION
    HEWETT, AJ
    HOLMES, G
    YOUNG, SJ
    PROCEEDINGS : INSTITUTE OF ACOUSTICS, VOL 8, PART 7: SPEECH & HEARING, 1986, 8 : 275 - 282
  • [33] Two-Step Unsupervised Speaker Adaptation Based on Speaker and Gender Recognition and HMM Combination
    Cerva, Petr
    Nouza, Jan
    Silovsky, Jan
    INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 2326 - 2329
  • [34] Speaker clustering and transformation for speaker adaptation in speech recognition systems
    Padmanabhan, M
    Bahl, LR
    Nahamoo, D
    Picheny, MA
    IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 1998, 6 (01): : 71 - 77
  • [35] Speaker independent feature selection for speech emotion recognition: A multi-task approach
    Elham Kalhor
    Behzad Bakhtiari
    Multimedia Tools and Applications, 2021, 80 : 8127 - 8146
  • [36] Smoothed N-best-based speaker adaptation for speech recognition
    Matsui, T
    Matsuoka, T
    Furui, S
    1997 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I - V: VOL I: PLENARY, EXPERT SUMMARIES, SPECIAL, AUDIO, UNDERWATER ACOUSTICS, VLSI; VOL II: SPEECH PROCESSING; VOL III: SPEECH PROCESSING, DIGITAL SIGNAL PROCESSING; VOL IV: MULTIDIMENSIONAL SIGNAL PROCESSING, NEURAL NETWORKS, 1997, : 1015 - 1018
  • [37] Robust speaker recognition - A feature-based approach
    Mammone, RJ
    Zhang, XY
    Ramachandran, RP
    IEEE SIGNAL PROCESSING MAGAZINE, 1996, 13 (05) : 58 - 71
  • [38] Speaker independent feature selection for speech emotion recognition: A multi-task approach
    Kalhor, Elham
    Bakhtiari, Behzad
    MULTIMEDIA TOOLS AND APPLICATIONS, 2021, 80 (06) : 8127 - 8146
  • [39] A Robust Speaker Recognition Approach Based on Model Compensation
    Geng, Yun-Xiao
    Wu, Wei
    ICSP: 2008 9TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING, VOLS 1-5, PROCEEDINGS, 2008, : 709 - 714
  • [40] N-Best-based unsupervised speaker adaptation for speech recognition
    Matsui, T
    Furui, S
    COMPUTER SPEECH AND LANGUAGE, 1998, 12 (01): : 41 - 50