SENet-based speech emotion recognition using synthesis-style transfer data augmentation

被引:0
|
作者
Rajan R. [1 ,3 ]
Hridya Raj T.V. [2 ,3 ]
机构
[1] Government Engineering College, Trivandrum
[2] College of Engineering, Trivandrum
[3] APJ Abdul Kalam Technological University, Thiruvananthapuram
关键词
Channel-attention mechanism; Data augmentation; Multi-speaker; Style transfer; Text-to-speech conversion;
D O I
10.1007/s10772-023-10071-8
中图分类号
学科分类号
摘要
This paper addresses speech emotion recognition using a channel-attention mechanism with a synthesized data augmentation approach. Convolutional neural network (CNN) produces channel attention map by exploiting the inter-channel relationship of features. The main issue faced in the speech emotion recognition domain is insufficient data for building an efficient model. The proposed work uses a style transfer scheme to achieve data augmentation by multi-voice synthesis from the text. It consists of text-to-speech (TTS) and style transfer modules. Synthesized speech is generated from the text for a target speaker’s voice by a TTS converter in the front end. Later, the emotion of the synthesized speech is obtained based on the emotional content fed to the style-transfer module. The text-to-speech module is trained using LibriSpeech and NUS-48E corpus. The quality of the synthesized speech samples is also rated using subjective evaluation through mean opinion score (MOS). The speech emotion recognition approach is systematically evaluated using the Berlin EMO-DB corpus. The channel-attention-based Squeeze and Excitation Network (SEnet) shows its promise in the speech emotion recognition experiment. © 2023, The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature.
引用
收藏
页码:1017 / 1030
页数:13
相关论文
共 50 条
  • [21] Speech emotion recognition using data augmentation method by cycle-generative adversarial networks
    Arash Shilandari
    Hossein Marvi
    Hossein Khosravi
    Wenwu Wang
    [J]. Signal, Image and Video Processing, 2022, 16 : 1955 - 1962
  • [22] Data Augmentation using Healthy Speech for Dysarthric Speech Recognition
    Vachhani, Bhavik
    Bhat, Chitralekha
    Kopparapu, Sunil Kumar
    [J]. 19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 471 - 475
  • [23] ON-THE-FLY DATA AUGMENTATION FOR TEXT-TO-SPEECH STYLE TRANSFER
    Chung, Raymond
    Mak, Brian
    [J]. 2021 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), 2021, : 634 - 641
  • [24] Data Augmentation Techniques for Transfer Learning-Based Continuous Dysarthric Speech Recognition
    T. A. Mariya Celin
    P. Vijayalakshmi
    T. Nagarajan
    [J]. Circuits, Systems, and Signal Processing, 2023, 42 : 601 - 622
  • [25] Data Augmentation Techniques for Transfer Learning-Based Continuous Dysarthric Speech Recognition
    Celin, T. A. Mariya
    Vijayalakshmi, P.
    Nagarajan, T.
    [J]. CIRCUITS SYSTEMS AND SIGNAL PROCESSING, 2023, 42 (01) : 601 - 622
  • [26] Effect of Data Augmentation, Cross-Validation Methods in Robustness of Explainable Speech Based Emotion Recognition
    Shinde, Ashwini S.
    Patil, Vaishali V.
    [J]. TRAITEMENT DU SIGNAL, 2024, 41 (03) : 1565 - 1574
  • [27] MA-CapsNet-DA: Speech emotion recognition based on MA-CapsNet using data augmentation[Formula presented]
    Zhang, Huiyun
    Huang, Heming
    Han, Henry
    [J]. Expert Systems with Applications, 2024, 244
  • [28] Data Augmentation Using Conditional GANs for Facial Emotion Recognition
    Yi, Wei
    Sun, Yaoran
    He, Sailing
    [J]. 2018 PROGRESS IN ELECTROMAGNETICS RESEARCH SYMPOSIUM (PIERS-TOYAMA), 2018, : 710 - 714
  • [29] Data Augmentation for EEG-Based Emotion Recognition Using Generative Adversarial Networks
    Bao, Guangcheng
    Yan, Bin
    Tong, Li
    Shu, Jun
    Wang, Linyuan
    Yang, Kai
    Zeng, Ying
    [J]. FRONTIERS IN COMPUTATIONAL NEUROSCIENCE, 2021, 15
  • [30] Non native speech recognition using audio style transfer
    Radzikowski, Kacper
    Forc, Mateusz
    Wang, Le
    Yoshie, Osamu
    Nowak, Robert M.
    [J]. PHOTONICS APPLICATIONS IN ASTRONOMY, COMMUNICATIONS, INDUSTRY, AND HIGH-ENERGY PHYSICS EXPERIMENTS 2019, 2019, 11176