A novel frequency warping scale for speech emotion recognition

被引:0
|
作者
Singh, Premjeet [1 ]
Saha, Goutam [1 ]
机构
[1] Indian Inst Technol, Dept Elect & Elect Commun Engn, Kharagpur, W Bengal, India
来源
关键词
Speech emotion recognition; Non-linear frequency warping; Constant-Q transform; CEPSTRAL COEFFICIENTS; FEATURES; MAGNITUDE;
D O I
10.21437/Interspeech.2023-1600
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
We investigate an optimised non-linear frequency warping scale for speech emotion recognition (SER). The proposed scale maps the speech spectrogram onto another time-frequency domain which is invariant to speaker-specific variations. Generally, the famous mel-scale designed on human audio perception is considered the de facto standard of frequency warping. However, designed mainly for speech recognition, the generalisability of mel on other speech processing tasks is debatable. Our experiments show that an emotion-specific scale designed on an SER database outperforms the standard mel-scale. Along with performance improvement, the proposed approach also provides insight into the emotion-relevant frequency regions for SER. Despite the database-dependent design of our approach, we find that the scale obtained from our experiments also shows SER performance improvement when tested on two other databases.
引用
收藏
页码:3647 / 3651
页数:5
相关论文
共 50 条
  • [1] Non-linear frequency warping using constant-Q transformation for speech emotion recognition
    Singh, Premjeet
    Saha, Goutam
    Sahidullah, Md
    2021 INTERNATIONAL CONFERENCE ON COMPUTER COMMUNICATION AND INFORMATICS (ICCCI), 2021,
  • [2] Fundamental Frequency Extraction in Speech Emotion Recognition
    Stasiak, Bartlomiej
    Rychlicki-Kicior, Krzysztof
    MULTIMEDIA COMMUNICATIONS, SERVICES AND SECURITY, 2012, 287 : 292 - 303
  • [3] Frequency-warping invariant features for automatic speech recognition
    Mertins, Alfred
    Rademacher, Jan
    2006 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-13, 2006, : 5883 - 5886
  • [4] DYNAMIC FREQUENCY WARPING FOR SPEAKER ADAPTATION IN AUTOMATIC SPEECH RECOGNITION
    PALIWAL, KK
    AINSWORTH, WA
    JOURNAL OF PHONETICS, 1985, 13 (02) : 123 - 134
  • [5] Novel acoustic features for speech emotion recognition
    Yong-Wan Roh
    Dong-Ju Kim
    Woo-Seok Lee
    Kwang-Seok Hong
    Science in China Series E: Technological Sciences, 2009, 52 : 1838 - 1848
  • [6] Novel acoustic features for speech emotion recognition
    ROH Yong-Wan
    KIM Dong-Ju
    LEE Woo-Seok
    HONG Kwang-Seok
    Science in China(Series E:Technological Sciences), 2009, 52 (07) : 1838 - 1848
  • [7] Novel acoustic features for speech emotion recognition
    Roh Yong-Wan
    Kim Dong-Ju
    Lee Woo-Seok
    Hong Kwang-Seok
    SCIENCE IN CHINA SERIES E-TECHNOLOGICAL SCIENCES, 2009, 52 (07): : 1838 - 1848
  • [8] Frequency warping approach for vocal tract length normalization in speech recognition
    Xu, W
    Wang, BX
    Ding, Q
    PROCEEDINGS OF THE THIRD INTERNATIONAL SYMPOSIUM ON INSTRUMENTATION SCIENCE AND TECHNOLOGY, VOL 2, 2004, : 494 - 499
  • [9] Data Augmentation Based on Frequency Warping for Recognition of Cleft Palate Speech
    Fujiwara, Kento
    Takashima, Ryoichi
    Sugiyama, Chihiro
    Tanaka, Nobukazu
    Nohara, Kanji
    Nozaki, Kazunori
    Takiguchi, Tetsuya
    2021 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2021, : 471 - 476
  • [10] On combining frequency warping and spectral shaping in HMM based speech recognition
    Potamianos, A
    Rose, RC
    1997 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I - V: VOL I: PLENARY, EXPERT SUMMARIES, SPECIAL, AUDIO, UNDERWATER ACOUSTICS, VLSI; VOL II: SPEECH PROCESSING; VOL III: SPEECH PROCESSING, DIGITAL SIGNAL PROCESSING; VOL IV: MULTIDIMENSIONAL SIGNAL PROCESSING, NEURAL NETWORKS - VOL V: STATISTICAL SIGNAL AND ARRAY PROCESSING, APPLICATIONS, 1997, : 1275 - 1278