A Generation of Enhanced Data by Variational Autoencoders and Diffusion Modeling

被引:0
|
作者
Kim, Young-Jun [1 ]
Lee, Seok-Pil [2 ]
机构
[1] SangMyung Univ, Grad Sch, Dept Comp Sci, Seoul 03016, South Korea
[2] SangMyung Univ, Dept Intelligent IoT, Seoul 03016, South Korea
关键词
deep learning; generative adversarial networks; data augmentation; speech emotion recognition; speech emotion synthesis; diffusion; SPEECH EMOTION RECOGNITION; DATABASES; FEATURES;
D O I
10.3390/electronics13071314
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In the domain of emotion recognition in audio signals, the clarity and precision of emotion delivery are of paramount importance. This study aims to augment and enhance the emotional clarity of waveforms (wav) using a technique called stable diffusion. Datasets from EmoDB and RAVDESS, two well-known repositories of emotional audio clips, were utilized as the main sources for all experiments. We used the ResNet-based emotion recognition model to determine the emotion recognition of the augmented waveforms after emotion embedding and enhancement, and compared the enhanced data before and after the enhancement. The results showed that applying a mel-spectrogram-based diffusion model to the existing waveforms enlarges the salience of the embedded emotions, resulting in better identification. This augmentation has significant potential to advance the field of emotion recognition and synthesis, paving the way for improved applications in these areas.
引用
收藏
页数:16
相关论文
共 50 条
  • [1] Diffusion Variational Autoencoders
    Rey, Luis A. Perez
    Menkovski, Vlado
    Portegies, Jim
    [J]. PROCEEDINGS OF THE TWENTY-NINTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2020, : 2704 - 2710
  • [2] Supervised Variational Autoencoders for Soft Sensor Modeling With Missing Data
    Xie, Ruimin
    Jan, Nabil Magbool
    Hao, Kuangrong
    Chen, Lei
    Huang, Biao
    [J]. IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, 2020, 16 (04) : 2820 - 2828
  • [3] Towards Data-Driven Volatility Modeling with Variational Autoencoders
    Dierckx, Thomas
    Davis, Jesse
    Schoutens, Wim
    [J]. MACHINE LEARNING AND PRINCIPLES AND PRACTICE OF KNOWLEDGE DISCOVERY IN DATABASES, ECML PKDD 2022, PT II, 2023, 1753 : 97 - 111
  • [4] Exploring the Potential of Variational Autoencoders for Modeling Nonlinear Relationships in Psychological Data
    Milano, Nicola
    Casella, Monica
    Esposito, Raffaella
    Marocco, Davide
    [J]. BEHAVIORAL SCIENCES, 2024, 14 (07)
  • [5] Diffusion bridges vector quantized variational autoencoders
    Cohen, Max
    Quispe, Guillaume
    Le Corff, Sylvain
    Ollion, Charles
    Moulines, Eric
    [J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
  • [6] Green Generative Modeling: Recycling Dirty Data using Recurrent Variational Autoencoders
    Wang, Yu
    Dai, Bin
    Hua, Gang
    Aston, John
    Wipf, David
    [J]. CONFERENCE ON UNCERTAINTY IN ARTIFICIAL INTELLIGENCE (UAI2017), 2017,
  • [7] Parallel Variational Autoencoders for Multiple Responses Generation
    Li, Miaojin
    Fu, Peng
    Lin, Zheng
    Wang, Weiping
    [J]. 2020 IEEE INTL SYMP ON PARALLEL & DISTRIBUTED PROCESSING WITH APPLICATIONS, INTL CONF ON BIG DATA & CLOUD COMPUTING, INTL SYMP SOCIAL COMPUTING & NETWORKING, INTL CONF ON SUSTAINABLE COMPUTING & COMMUNICATIONS (ISPA/BDCLOUD/SOCIALCOM/SUSTAINCOM 2020), 2020, : 128 - 136
  • [8] Regularizing Variational Autoencoders for Molecular Graph Generation
    Li, Xin
    Lyu, Xiaoqing
    Zhang, Hao
    Hu, Keqi
    Tang, Zhi
    [J]. NEURAL INFORMATION PROCESSING, ICONIP 2019, PT V, 2019, 1143 : 467 - 476
  • [9] Empirical Evaluation of Variational Autoencoders and Denoising Diffusion Models for Data Augmentation in Bioacoustics Classification
    Herbst, Charles
    Jeantet, Lorene
    Dufourq, Emmanuel
    [J]. SOUTH AFRICAN COMPUTER SCIENCE AND INFORMATION SYSTEMS RESEARCH TRENDS, SAICSIT 2024, 2024, 2159 : 45 - 61
  • [10] DYNAMIC VARIATIONAL AUTOENCODERS FOR VISUAL PROCESS MODELING
    Sager, Alexander
    Shen, Hao
    [J]. 2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 3677 - 3681