SENet-based speech emotion recognition using synthesis-style transfer data augmentation

被引:0
|
作者
Rajan R. [1 ,3 ]
Hridya Raj T.V. [2 ,3 ]
机构
[1] Government Engineering College, Trivandrum
[2] College of Engineering, Trivandrum
[3] APJ Abdul Kalam Technological University, Thiruvananthapuram
关键词
Channel-attention mechanism; Data augmentation; Multi-speaker; Style transfer; Text-to-speech conversion;
D O I
10.1007/s10772-023-10071-8
中图分类号
学科分类号
摘要
This paper addresses speech emotion recognition using a channel-attention mechanism with a synthesized data augmentation approach. Convolutional neural network (CNN) produces channel attention map by exploiting the inter-channel relationship of features. The main issue faced in the speech emotion recognition domain is insufficient data for building an efficient model. The proposed work uses a style transfer scheme to achieve data augmentation by multi-voice synthesis from the text. It consists of text-to-speech (TTS) and style transfer modules. Synthesized speech is generated from the text for a target speaker’s voice by a TTS converter in the front end. Later, the emotion of the synthesized speech is obtained based on the emotional content fed to the style-transfer module. The text-to-speech module is trained using LibriSpeech and NUS-48E corpus. The quality of the synthesized speech samples is also rated using subjective evaluation through mean opinion score (MOS). The speech emotion recognition approach is systematically evaluated using the Berlin EMO-DB corpus. The channel-attention-based Squeeze and Excitation Network (SEnet) shows its promise in the speech emotion recognition experiment. © 2023, The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature.
引用
收藏
页码:1017 / 1030
页数:13
相关论文
共 50 条
  • [1] CycleGAN-based Emotion Style Transfer as Data Augmentation for Speech Emotion Recognition
    Bao, Fang
    Neumann, Michael
    Ngoc Thang Vu
    [J]. INTERSPEECH 2019, 2019, : 2828 - 2832
  • [2] Speech Emotion Recognition Using Data Augmentation
    Kapoor, Tanisha
    Ganguly, Arnaja
    Rajeswari, D.
    [J]. 2024 INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTING, COMMUNICATION AND APPLIED INFORMATICS, ACCAI 2024, 2024,
  • [3] Speech emotion recognition using data augmentation
    V. M. Praseetha
    P. P. Joby
    [J]. International Journal of Speech Technology, 2022, 25 : 783 - 792
  • [4] Speech emotion recognition using data augmentation
    Praseetha, V. M.
    Joby, P. P.
    [J]. INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY, 2021, 25 (4) : 783 - 792
  • [5] Data Augmentation using GANs for Speech Emotion Recognition
    Chatziagapi, Aggelina
    Paraskevopoulos, Georgios
    Sgouropoulos, Dimitris
    Pantazopoulos, Georgios
    Nikandrou, Malvina
    Giannakopoulos, Theodoros
    Katsamanis, Athanasios
    Potamianos, Alexandros
    Narayanan, Shrikanth
    [J]. INTERSPEECH 2019, 2019, : 171 - 175
  • [6] Enhanced Speech Emotion Recognition Using DCGAN-Based Data Augmentation
    Baek, Ji-Young
    Lee, Seok-Pil
    Tsihrintzis, George A.
    [J]. ELECTRONICS, 2023, 12 (18)
  • [7] Strong Generalized Speech Emotion Recognition Based on Effective Data Augmentation
    Tao, Huawei
    Shan, Shuai
    Hu, Ziyi
    Zhu, Chunhua
    Ge, Hongyi
    [J]. ENTROPY, 2023, 25 (01)
  • [8] Adversarial Data Augmentation Network for Speech Emotion Recognition
    Yi, Lu
    Mak, Man-Wai
    [J]. 2019 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2019, : 529 - 534
  • [9] Transformer-Based Multilingual Speech Emotion Recognition Using Data Augmentation and Feature Fusion
    Al-onazi, Badriyya B.
    Nauman, Muhammad Asif
    Jahangir, Rashid
    Malik, Muhmmad Mohsin
    Alkhammash, Eman H.
    Elshewey, Ahmed M.
    [J]. APPLIED SCIENCES-BASEL, 2022, 12 (18):
  • [10] SPEECH EMOTION RECOGNITION WITH MULTISCALE AREA ATTENTION AND DATA AUGMENTATION
    Xu, Mingke
    Zhang, Fan
    Cui, Xiaodong
    Zhang, Wei
    [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 6319 - 6323