Applying a Convolutional Vision Transformer for Emotion Recognition in Children with Autism: Fusion of Facial Expressions and Speech Features

被引:0
|
作者
Wang, Yonggu [1 ]
Pan, Kailin [1 ]
Shao, Yifan [1 ]
Ma, Jiarong [1 ]
Li, Xiaojuan [2 ]
机构
[1] Zhejiang Univ Technol, Coll Educ, Hangzhou 310023, Peoples R China
[2] Zhejiang Univ Finance & Econ, Mental Hlth Educ Ctr, Hangzhou 310018, Peoples R China
来源
APPLIED SCIENCES-BASEL | 2025年 / 15卷 / 06期
基金
中国国家自然科学基金;
关键词
emotion recognition; multimodal feature fusion; deep learning; children with autism;
D O I
10.3390/app15063083
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
With advances in digital technology, including deep learning and big data analytics, new methods have been developed for autism diagnosis and intervention. Emotion recognition and the detection of autism in children are prominent subjects in autism research. Typically using single-modal data to analyze the emotional states of children with autism, previous research has found that the accuracy of recognition algorithms must be improved. Our study creates datasets on the facial and speech emotions of children with autism in their natural states. A convolutional vision transformer-based emotion recognition model is constructed for the two distinct datasets. The findings indicate that the model achieves accuracies of 79.12% and 83.47% for facial expression recognition and Mel spectrogram recognition, respectively. Consequently, we propose a multimodal data fusion strategy for emotion recognition and construct a feature fusion model based on an attention mechanism, which attains a recognition accuracy of 90.73%. Ultimately, by using gradient-weighted class activation mapping, a prediction heat map is produced to visualize facial expressions and speech features under four emotional states. This study offers a technical direction for the use of intelligent perception technology in the realm of special education and enriches the theory of emotional intelligence perception of children with autism.
引用
收藏
页数:35
相关论文
共 50 条
  • [31] Convolutional Neural Network with Spectrogram and Perceptual Features for Speech Emotion Recognition
    Zhang, Linjuan
    Wang, Longbiao
    Dang, Jianwu
    Guo, Lili
    Guan, Haotian
    NEURAL INFORMATION PROCESSING (ICONIP 2018), PT IV, 2018, 11304 : 62 - 71
  • [32] Stacked Deep Convolutional Auto-Encoders for Emotion Recognition from Facial Expressions
    Ruiz-Garcia, Ariel
    Elshaw, Mark
    Altahhan, Abdulrahman
    Palade, Vasile
    2017 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2017, : 1586 - 1593
  • [33] Facial Emotion Recognition in Children with High Functioning Autism and Children with Social Phobia
    Nina Wong
    Deborah C. Beidel
    Dustin E. Sarver
    Valerie Sims
    Child Psychiatry & Human Development, 2012, 43 : 775 - 794
  • [34] Facial Emotion Recognition in Children with High Functioning Autism and Children with Social Phobia
    Wong, Nina
    Beidel, Deborah C.
    Sarver, Dustin E.
    Sims, Valerie
    CHILD PSYCHIATRY & HUMAN DEVELOPMENT, 2012, 43 (05) : 775 - 794
  • [35] Facial Micro-Expression Recognition Enhanced by Score Fusion and a Hybrid Model from Convolutional LSTM and Vision Transformer
    Zheng, Yufeng
    Blasch, Erik
    SENSORS, 2023, 23 (12)
  • [36] Reduced Recognition of Dynamic Facial Emotional Expressions and Emotion-Specific Response Bias in Children with an Autism Spectrum Disorder
    Kris Evers
    Jean Steyaert
    Ilse Noens
    Johan Wagemans
    Journal of Autism and Developmental Disorders, 2015, 45 : 1774 - 1784
  • [37] Reduced Recognition of Dynamic Facial Emotional Expressions and Emotion-Specific Response Bias in Children with an Autism Spectrum Disorder
    Evers, Kris
    Steyaert, Jean
    Noens, Ilse
    Wagemans, Johan
    JOURNAL OF AUTISM AND DEVELOPMENTAL DISORDERS, 2015, 45 (06) : 1774 - 1784
  • [38] The Effect of Context in Facial Emotion Recognition in Children With Autism Spectrum Disorder
    Noh, Jiyoung
    Chung, Kyongmee
    I-PERCEPTION, 2017, 8 : 52 - 53
  • [39] Fine-grained emotion recognition: fusion of physiological signals and facial expressions on spontaneous emotion corpus
    Setiawan, Feri
    Prabono, Aria Ghora
    Khowaja, Sunder Ali
    Kim, Wangsoo
    Park, Kyoungsoo
    Yahya, Bernardo Nugroho
    Lee, Seok-Lyong
    Hong, Jin Pyo
    INTERNATIONAL JOURNAL OF AD HOC AND UBIQUITOUS COMPUTING, 2020, 35 (03) : 162 - 178
  • [40] Multimodal emotion recognition from facial expression and speech based on feature fusion
    Guichen Tang
    Yue Xie
    Ke Li
    Ruiyu Liang
    Li Zhao
    Multimedia Tools and Applications, 2023, 82 : 16359 - 16373