Non-linear frequency warping using constant-Q transformation for speech emotion recognition

被引:0
|
作者
Singh, Premjeet [1 ]
Saha, Goutam [1 ]
Sahidullah, Md [2 ]
机构
[1] Indian Inst Technol Kharagpur, Dept Elect & ECE, Kharagpur, W Bengal, India
[2] Univ Lorraine, CNRS, INRIA, LORIA, F-54000 Nancy, France
关键词
Speech emotion recognition (SER); Constant-Q transform (CQT); Mel frequency analysis; Cross-corpora evaluation; FEATURES;
D O I
10.1109/ICCC150826.2021.9402569
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
In this work, we explore the constant-Q transform (CQT) for speech emotion recognition (SER). The CQT-based time-frequency analysis provides variable spectro-temporal resolution with higher frequency resolution at lower frequencies. Since lower-frequency regions of speech signal contain more emotion-related information than higher-frequency regions, the increased low-frequency resolution of CQT makes it more promising for SER than standard short-time Fourier transform (STFT). We present a comparative analysis of short-term acoustic features based on STFT and CQT for SER with deep neural network (DNN) as a back-end classifier. We optimize different parameters for both features. The CQT-based features outperform the STFT-based spectral features for SER experiments. Further experiments with cross-corpora evaluation demonstrate that the CQT-based systems provide better generalization with out-of-domain training data.
引用
收藏
页数:6
相关论文
共 50 条
  • [1] Analysis of constant-Q filterbank based representations for speech emotion recognition
    Singh, Premjeet
    Waldekar, Shefali
    Sahidullah, Md
    Saha, Goutam
    [J]. DIGITAL SIGNAL PROCESSING, 2022, 130
  • [2] Recognition of Emotion Using Non-Linear Dynamics of Speech
    Harimi, Ali
    Shalizadi, Ali
    Ahmadyfard, Alireza
    [J]. 2014 7th International Symposium on Telecommunications (IST), 2014, : 446 - 451
  • [3] A constant-Q spectral transformation with improved frequency response
    Graziosi, DB
    dos Santos, CN
    Netto, SL
    Biscainho, LWR
    [J]. 2004 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS, VOL 5, PROCEEDINGS, 2004, : 544 - 547
  • [4] Automatic Emotion Recognition in Compressed Speech Using Acoustic and Non-Linear Features
    Garcia, N.
    Vasquez-Correa, J. C.
    Arias-Londono, J. D.
    Vargas-Bonilla, J. F.
    Orozco-Arroyave, J. R.
    [J]. 2015 20TH SYMPOSIUM ON SIGNAL PROCESSING, IMAGES AND COMPUTER VISION (STSIVA), 2015,
  • [5] SCQT-MaxViT: Speech Emotion Recognition With Constant-Q Transform and Multi-Axis Vision Transformer
    Ong, Kah Liang
    Lee, Chin Poo
    Lim, Heng Siong
    Lim, Kian Ming
    Mukaida, Takeki
    [J]. IEEE ACCESS, 2023, 11 : 63081 - 63091
  • [6] Feature extraction using non-linear transformation for robust speech recognition on the AURORA database
    Sharma, S
    Ellis, D
    Kajarekar, S
    Jain, P
    Hermansky, H
    [J]. 2000 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, PROCEEDINGS, VOLS I-VI, 2000, : 1117 - 1120
  • [7] Speech Emotion Recognition Using Non-Linear Teager Energy Based Features in Noisy Environments
    Georgogiannis, Alexandros
    Digalakis, Vassilis
    [J]. 2012 PROCEEDINGS OF THE 20TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2012, : 2045 - 2049
  • [8] Speech enhancement based on nonnegative matrix factorization in constant-Q frequency domain
    Xu, Longting
    Wei, Zhilin
    Zaidi, Syed Faham Ali
    Ren, Bo
    Yang, Jichen
    [J]. APPLIED ACOUSTICS, 2021, 174
  • [9] Comparing linear and non-linear transformation of speech
    Mesbahi, Larbi
    Barreaud, Vincent
    Boeffard, Olivier
    [J]. PROCEEDINGS OF THE 9TH WSEAS INTERNATIONAL CONFERENCE ON SIGNALS, SPEECH AND IMAGE PROCESSING/9TH WSEAS INTERNATIONAL CONFERENCE ON MULTIMEDIA, INTERNET & VIDEO TECHNOLOGIES, 2009, : 68 - 73
  • [10] NON-LINEAR FREQUENCY WARPING FOR VTLN USING SUBGLOTTAL RESONANCES AND THE THIRD FORMANT FREQUENCY
    Arsikere, Harish
    Lulich, Steven M.
    Alwan, Abeer
    [J]. 2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 7922 - 7926