On the Effect of Log-Mel Spectrogram Parameter Tuning for Deep Learning-Based Speech Emotion Recognition

被引:3
|
作者
Mukhamediya, Azamat [1 ]
Fazli, Siamac [2 ]
Zollanvari, Amin [1 ]
机构
[1] Nazarbayev Univ, Sch Engn & Digital Sci, Dept Elect & Comp Engn, Astana 010000, Kazakhstan
[2] Nazarbayev Univ, Sch Engn & Digital Sci, Dept Comp Sci, Astana 010000, Kazakhstan
关键词
Log-Mel spectrogram; speech emotion recognition; SqueezeNet; NEURAL-NETWORKS;
D O I
10.1109/ACCESS.2023.3287093
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Speech emotion recognition (SER) has become a major area of investigation in human-computer interaction. Conventionally, SER is formulated as a classification problem that follows a common methodology: (i) extracting features from speech signals; and (ii) constructing an emotion classifier using extracted features. With the advent of deep learning, however, the former stage is integrated into the latter. That is to say, deep neural networks (DNNs), which are trained using log-Mel spectrograms (LMS) of audio waveforms, extract discriminative features from LMS. A critical issue, and one that is often overlooked, is that this procedure is done without relating the choice of LMS parameters to the performance of the trained DNN classifiers. It is commonplace in SER studies that practitioners assume some "usual" values for these parameters and devote major efforts to training and comparing various DNN architectures. In contrast with this common approach, in this work we choose a single lightweight pre-trained architecture, namely, SqueezeNet, and shift our main effort into tuning LMS parameters. Our empirical results using three publicly available SER datasets show that: (i) parameters of LMS can considerably affect the performance of DNNs; and (ii) by tuning LMS parameters, highly competitive classification performance can be achieved. In particular, treating LMS parameters as hyperparameters and tuning them led to similar to 23%, similar to 10%, and similar to 11% improvement in contrast with the use of "usual" values of LMS parameters in EmoDB, IEMOCAP, and SAVEE datasets, respectively.
引用
收藏
页码:61950 / 61957
页数:8
相关论文
共 50 条
  • [21] Multi-Distributed Speech Emotion Recognition Based on Mel Frequency Cepstogram and Parameter Transfer
    LIN Long
    TAN Liang
    Chinese Journal of Electronics, 2022, 31 (01) : 155 - 167
  • [22] Deep learning-based recognition and parameter characterization of antibubbles
    Bai, Lichun
    Chai, Zishu
    Lin, Sen
    CHEMICAL ENGINEERING SCIENCE, 2025, 304
  • [23] Comparison of the Effects of Mel Coefficients and Spectrogram Images via Deep Learning in Emotion Classification
    Demircan, Semiye
    Ornek, Humar Kahramanli
    TRAITEMENT DU SIGNAL, 2020, 37 (01) : 51 - 57
  • [24] Multi-Distributed Speech Emotion Recognition Based on Mel Frequency Cepstogram and Parameter Transfer
    Lin Long
    Tan Liang
    CHINESE JOURNAL OF ELECTRONICS, 2022, 31 (01) : 155 - 167
  • [25] Speech emotion recognition based on optimized deep features of dual-channel complementary spectrogram
    Li, Juan
    Zhang, Xueying
    Li, Fenglian
    Huang, Lixia
    INFORMATION SCIENCES, 2023, 649
  • [26] Deep learning based Affective Model for Speech Emotion Recognition
    Zhou, Xi
    Guo, Junqi
    Bie, Rongfang
    2016 INT IEEE CONFERENCES ON UBIQUITOUS INTELLIGENCE & COMPUTING, ADVANCED & TRUSTED COMPUTING, SCALABLE COMPUTING AND COMMUNICATIONS, CLOUD AND BIG DATA COMPUTING, INTERNET OF PEOPLE, AND SMART WORLD CONGRESS (UIC/ATC/SCALCOM/CBDCOM/IOP/SMARTWORLD), 2016, : 841 - 846
  • [27] Deep Learning Approach towards Emotion Recognition Based on Speech
    Butala, Padmanabh
    Pawar, Rajendra
    Jadhav, Nagesh
    Kalangan, Manas
    Dhumal, Aniket
    Kakad, Sahil
    JOURNAL OF ADVANCED APPLIED SCIENTIFIC RESEARCH, 2024, 6 (03): : 16 - 24
  • [28] Deep Learning Based Emotion Recognition from Chinese Speech
    Zhang, Weishan
    Zhao, Dehai
    Chen, Xiufeng
    Zhang, Yuanjie
    INCLUSIVE SMART CITIES AND DIGITAL HEALTH, 2016, 9677 : 49 - 58
  • [29] Feature Fusion of Speech Emotion Recognition Based on Deep Learning
    Liu, Gang
    He, Wei
    Jin, Bicheng
    PROCEEDINGS OF 2018 INTERNATIONAL CONFERENCE ON NETWORK INFRASTRUCTURE AND DIGITAL CONTENT (IEEE IC-NIDC), 2018, : 193 - 197
  • [30] Deep learning-based EEG emotion recognition: a comprehensive review
    Yuxiao Geng
    Shuo Shi
    Xiaoke Hao
    Neural Computing and Applications, 2025, 37 (4) : 1919 - 1950