An Enhanced Cross-Attention Based Multimodal Model for Depression Detection

被引:0
|
作者
Kou, Yifan [1 ]
Ge, Fangzhen [1 ,2 ]
Chen, Debao [2 ,3 ]
Shen, Longfeng [1 ,2 ,4 ]
Liu, Huaiyu [1 ]
机构
[1] School of Computer Science and Technology, Huaibei Normal University, Huaibei, China
[2] Anhui Engineering Research Center for Intelligent Computing and Application on Cognitive Behavior (ICACB), Anhui, Huaibei, China
[3] School of Physics and Electronic Information, Huaibei Normal University, Huaibei, China
[4] Institute of Artificial Intelligence, Hefei Comprehensive National Science Center, Hefei, China
基金
中国国家自然科学基金;
关键词
Deep learning - Neural networks;
D O I
10.1111/coin.70019
中图分类号
学科分类号
摘要
Depression, a prevalent mental disorder in modern society, significantly impacts people's daily lives. Recently, there have been advancements in developing automated diagnosis models for detecting depression. However, data scarcity, primarily due to privacy concerns, has posed a challenge. Traditional speech features have limitations in representing knowledge for depression diagnosis, and the complexity of deep learning algorithms necessitates substantial data support. Furthermore, existing multimodal methods based on neural networks overlook the heterogeneity gap between different modalities, potentially resulting in redundant information. To address these issues, we propose a multimodal depression detection model based on the Enhanced Cross-Attention (ECA) Mechanism. This model effectively explores text-speech interactions while considering modality heterogeneity. Data scarcity has been mitigated by fine-tuning pre-trained models. Additionally, we design a modal fusion module based on ECA, which emphasizes similarity responses and updates the weight of each modal feature based on the similarity information between modal features. Furthermore, for speech feature extraction, we have reduced the computational complexity of the model by integrating a multi-window self-attention mechanism with the Fourier transform. The proposed model is evaluated on the public dataset, DAIC-WOZ, achieving an accuracy of 80.0% and an average F1 value improvement of 4.3% compared with relevant methods. © 2025 Wiley Periodicals LLC.
引用
收藏
相关论文
共 50 条
  • [21] A multimodal fusion network based on a cross-attention mechanism for the classification of Parkinsonian tremor and essential tremor
    Lu Tang
    Qianyuan Hu
    Xiangrui Wang
    Long Liu
    Hui Zheng
    Wenjie Yu
    Ningdi Luo
    Jun Liu
    Chengli Song
    Scientific Reports, 14 (1)
  • [22] SCANET: Improving multimodal representation and fusion with sparse- and cross-attention for multimodal sentiment analysis
    Wang, Hao
    Yang, Mingchuan
    Li, Zheng
    Liu, Zhenhua
    Hu, Jie
    Fu, Ziwang
    Liu, Feng
    COMPUTER ANIMATION AND VIRTUAL WORLDS, 2022, 33 (3-4)
  • [23] Object Detection in Multispectral Remote Sensing Images Based on Cross-Modal Cross-Attention
    Zhao, Pujie
    Ye, Xia
    Du, Ziang
    SENSORS, 2024, 24 (13)
  • [24] MSER: Multimodal speech emotion recognition using cross-attention with deep fusion
    Khan, Mustaqeem
    Gueaieb, Wail
    El Saddik, Abdulmotaleb
    Kwon, Soonil
    EXPERT SYSTEMS WITH APPLICATIONS, 2024, 245
  • [25] Dense Graph Convolutional With Joint Cross-Attention Network for Multimodal Emotion Recognition
    Cheng, Cheng
    Liu, Wenzhe
    Feng, Lin
    Jia, Ziyu
    IEEE TRANSACTIONS ON COMPUTATIONAL SOCIAL SYSTEMS, 2024, : 6672 - 6683
  • [26] Multimodal Dual Cross-Attention Fusion Strategy for Autonomous Garbage Classification System
    Xu, Huxiu
    Tang, Wei
    Li, Zhaoyang
    Qin, Kecheng
    Zou, Jun
    IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, 2024, : 13319 - 13329
  • [27] MHA: a multimodal hierarchical attention model for depression detection in social media
    Zepeng Li
    Zhengyi An
    Wenchuan Cheng
    Jiawei Zhou
    Fang Zheng
    Bin Hu
    Health Information Science and Systems, 11
  • [28] MHA: a multimodal hierarchical attention model for depression detection in social media
    Li, Zepeng
    An, Zhengyi
    Cheng, Wenchuan
    Zhou, Jiawei
    Zheng, Fang
    Hu, Bin
    HEALTH INFORMATION SCIENCE AND SYSTEMS, 2023, 11 (01)
  • [29] VISUAL QUESTION ANSWERING IN REMOTE SENSING WITH CROSS-ATTENTION AND MULTIMODAL INFORMATION BOTTLENECK
    Songara, Jayesh
    Pande, Shivam
    Choudhury, Shabnam
    Banerjee, Biplab
    Velmurugan, Rajbabu
    IGARSS 2023 - 2023 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM, 2023, : 6278 - 6281
  • [30] Multimodal Sentiment Analysis of Government Information Comments Based on Contrastive Learning and Cross-Attention Fusion Networks
    Mu, Guangyu
    Chen, Chuanzhi
    Li, Xiurong
    Li, Jiaxue
    Ju, Xiaoqing
    Dai, Jiaxiu
    IEEE Access, 2024, 12 : 165525 - 165538