Applying a Convolutional Vision Transformer for Emotion Recognition in Children with Autism: Fusion of Facial Expressions and Speech Features

被引:0
|
作者
Wang, Yonggu [1 ]
Pan, Kailin [1 ]
Shao, Yifan [1 ]
Ma, Jiarong [1 ]
Li, Xiaojuan [2 ]
机构
[1] Zhejiang Univ Technol, Coll Educ, Hangzhou 310023, Peoples R China
[2] Zhejiang Univ Finance & Econ, Mental Hlth Educ Ctr, Hangzhou 310018, Peoples R China
来源
APPLIED SCIENCES-BASEL | 2025年 / 15卷 / 06期
基金
中国国家自然科学基金;
关键词
emotion recognition; multimodal feature fusion; deep learning; children with autism;
D O I
10.3390/app15063083
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
With advances in digital technology, including deep learning and big data analytics, new methods have been developed for autism diagnosis and intervention. Emotion recognition and the detection of autism in children are prominent subjects in autism research. Typically using single-modal data to analyze the emotional states of children with autism, previous research has found that the accuracy of recognition algorithms must be improved. Our study creates datasets on the facial and speech emotions of children with autism in their natural states. A convolutional vision transformer-based emotion recognition model is constructed for the two distinct datasets. The findings indicate that the model achieves accuracies of 79.12% and 83.47% for facial expression recognition and Mel spectrogram recognition, respectively. Consequently, we propose a multimodal data fusion strategy for emotion recognition and construct a feature fusion model based on an attention mechanism, which attains a recognition accuracy of 90.73%. Ultimately, by using gradient-weighted class activation mapping, a prediction heat map is produced to visualize facial expressions and speech features under four emotional states. This study offers a technical direction for the use of intelligent perception technology in the realm of special education and enriches the theory of emotional intelligence perception of children with autism.
引用
收藏
页数:35
相关论文
共 50 条
  • [41] Fusion of Global Statistical and Segmental Spectral Features for Speech Emotion Recognition
    Hu, Hao
    Xu, Ming-Xing
    Wu, Wei
    INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4, 2007, : 1013 - 1016
  • [42] PERFORMANCE ANALYSIS OF SPECTRAL AND PROSODIC FEATURES AND THEIR FUSION FOR EMOTION RECOGNITION IN SPEECH
    Gaurav, Manish
    2008 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY: SLT 2008, PROCEEDINGS, 2008, : 313 - 316
  • [43] Multimodal emotion recognition from facial expression and speech based on feature fusion
    Tang, Guichen
    Xie, Yue
    Li, Ke
    Liang, Ruiyu
    Zhao, Li
    MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 82 (11) : 16359 - 16373
  • [44] User Emotion Recognition Method Based on Facial Expression and Speech Signal Fusion
    Lu, Fei
    Zhang, Long
    Tian, Guohui
    PROCEEDINGS OF THE 2021 IEEE 16TH CONFERENCE ON INDUSTRIAL ELECTRONICS AND APPLICATIONS (ICIEA 2021), 2021, : 1121 - 1126
  • [45] Real-time facial emotion recognition model based on kernel autoencoder and convolutional neural network for autism children
    Talaat, Fatma M.
    Ali, Zainab H.
    Mostafa, Reham R.
    El-Rashidy, Nora
    SOFT COMPUTING, 2024, 28 (9-10) : 6695 - 6708
  • [46] ENHANCING THE RECOGNITION AND PRODUCTION OF FACIAL EXPRESSIONS OF EMOTION BY CHILDREN WITH MENTAL-RETARDATION
    STEWART, CA
    SINGH, NN
    RESEARCH IN DEVELOPMENTAL DISABILITIES, 1995, 16 (05) : 365 - 382
  • [47] Recognition of facial expressions of emotion by children with Attention-Deficit Hyperactivity Disorder
    Singh, SD
    Ellis, CR
    Winton, ASW
    Singh, NN
    Leung, JP
    Oswald, DP
    BEHAVIOR MODIFICATION, 1998, 22 (02) : 128 - 142
  • [48] Determining Optimal Features for Emotion Recognition from Speech by applying an Evolutionary Algorithm
    Huebner, David
    Vlasenko, Bogdan
    Grosser, Tobias
    Wendemuth, Andreas
    11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 3 AND 4, 2010, : 2358 - 2361
  • [49] Learning Salient Features for Speech Emotion Recognition Using Convolutional Neural Networks
    Mao, Qirong
    Dong, Ming
    Huang, Zhengwei
    Zhan, Yongzhao
    IEEE TRANSACTIONS ON MULTIMEDIA, 2014, 16 (08) : 2203 - 2213
  • [50] Parallelized Convolutional Recurrent Neural Network With Spectral Features for Speech Emotion Recognition
    Jiang, Pengxu
    Fu, Hongliang
    Tao, Huawei
    Lei, Peizhi
    Zhao, Li
    IEEE ACCESS, 2019, 7 : 90368 - 90377