Model Smoothing using Virtual Adversarial Training for Speech Emotion Estimation using Spontaneity

被引:0
|
作者
Kuwahara, Toyoaki [1 ]
Orihara, Ryohei [1 ]
Sei, Yuichi [1 ]
Tahara, Yasuyuki [1 ]
Ohsuga, Akihiko [1 ]
机构
[1] Univ Electrocommun, Grad Sch Informat & Engn, Tokyo, Japan
关键词
Deep Learning; Cross Corpus; Virtual Adversarial Training; Emotion Recognition; Speech Processing; Spontaneity; DEEP NEURAL-NETWORK; PERCEPTION;
D O I
10.5220/0008958405700577
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Speech-based emotion estimation increases accuracy through the development of deep learning. However, most emotion estimation using deep learning requires supervised learning, and it is difficult to obtain large datasets used for training. In addition, if the training data environment and the actual data environment are significantly different, the problem is that the accuracy of emotion estimation is reduced. Therefore, in this study, to solve these problems, we propose a emotion estimation model using virtual adversarial training (VAT), a semi-supervised learning method that improves the robustness of the model. Furthermore, research on the spontaneity of speech has progressed year by year, and recent studies have shown that the accuracy of emotion classification is improved when spontaneity is taken into account. We would like to investigate the effect of the spontaneity in a cross-language situation. First, VAT hyperparameters were first set by a preliminary experiment using a single corpus. Next, the robustness of the model generated by the evaluation experiment by the cross corpus was shown. Finally, we evaluate the accuracy of emotion estimation by considering spontaneity and showed improvement in the accuracy of the model using VAT by considering spontaneity.
引用
收藏
页码:570 / 577
页数:8
相关论文
共 50 条
  • [41] Bridging the cross-modal gap using adversarial training for speech-to-text translation
    Zhang, Hao
    Yang, Xukui
    Qu, Dan
    Li, Zhen
    DIGITAL SIGNAL PROCESSING, 2022, 131
  • [42] Bridging the cross-modal gap using adversarial training for speech-to-text translation
    Zhang, Hao
    Yang, Xukui
    Qu, Dan
    Li, Zhen
    DIGITAL SIGNAL PROCESSING, 2022, 131
  • [43] Speech emotion recognition using emotion perception spectral feature
    Jiang, Lin
    Tan, Ping
    Yang, Junfeng
    Liu, Xingbao
    Wang, Chao
    CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2021, 33 (11):
  • [44] A extraction of emotion in human speech using speech synthesize and each classifier for each emotion
    Kurematsu, Masaki
    Hakura, Jun
    Fujita, Hamido
    PROCEEDINGS OF THE 7TH WSEAS INTERNATIONAL CONFERENCE ON APPLIED COMPUTER SCIENCE: COMPUTER SCIENCE CHALLENGES, 2007, : 385 - +
  • [45] Virtual sawing using generative adversarial networks
    Batrakhanov, Daniel
    Zolotarev, Fedor
    Eerola, Tuomas
    Lensu, Lasse
    Kalviainen, Heikki
    PROCEEDINGS OF THE 2021 36TH INTERNATIONAL CONFERENCE ON IMAGE AND VISION COMPUTING NEW ZEALAND (IVCNZ), 2021,
  • [46] Speech Emotion Recognition Using Speech Feature and Word Embedding
    Atmaja, Bagus Tris
    Shirai, Kiyoaki
    Akagi, Masato
    2019 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2019, : 519 - 523
  • [47] Universal Adversarial Training Using Auxiliary Conditional Generative Model-Based Adversarial Attack Generation
    Dingeto, Hiskias
    Kim, Juntae
    APPLIED SCIENCES-BASEL, 2023, 13 (15):
  • [48] Objective estimation of tracheoesophageal speech ratings using an auditory model
    McDonald, Robert
    Parsa, Vijay
    Doyle, Philip C.
    JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2010, 127 (02): : 1032 - 1041
  • [49] Speech Separation Using a Composite Model for Complex Mask Estimation
    Hasannezhad, Mojtaba
    Ouyang, Zhiheng
    Zhu, Wei-Ping
    Champagne, Benoit
    2020 IEEE 63RD INTERNATIONAL MIDWEST SYMPOSIUM ON CIRCUITS AND SYSTEMS (MWSCAS), 2020, : 578 - 581
  • [50] Estimation of room acoustic transfer function using speech model
    Takiguchi, Tetsuya
    Sumida, Yuji
    Ariki, Yasuo
    2007 IEEE/SP 14TH WORKSHOP ON STATISTICAL SIGNAL PROCESSING, VOLS 1 AND 2, 2007, : 336 - 340