Harnessing multimodal approaches for depression detection using large language models and facial expressions

被引:0
|
作者
Misha Sadeghi [1 ]
Robert Richer [1 ]
Bernhard Egger [2 ]
Lena Schindler-Gmelch [3 ]
Lydia Helene Rupp [3 ]
Farnaz Rahimi [1 ]
Matthias Berking [3 ]
Bjoern M. Eskofier [1 ]
机构
[1] Friedrich-Alexander-Universität Erlangen-Nürnberg (FAU),Machine Learning and Data Analytics Lab (MaD Lab), Department Artificial Intelligence in Biomedical Engineering (AIBE)
[2] Friedrich-Alexander-Universität Erlangen-Nürnberg (FAU),Chair of Visual Computing (LGDV), Department of Computer Science
[3] Friedrich-Alexander-Universität Erlangen-Nürnberg (FAU),Chair of Clinical Psychology and Psychotherapy (KliPs)
[4] Institute of AI for Health,Translational Digital Health Group
[5] Helmholtz Zentrum München - German Research Center for Environmental Health,undefined
来源
关键词
D O I
10.1038/s44184-024-00112-8
中图分类号
学科分类号
摘要
Detecting depression is a critical component of mental health diagnosis, and accurate assessment is essential for effective treatment. This study introduces a novel, fully automated approach to predicting depression severity using the E-DAIC dataset. We employ Large Language Models (LLMs) to extract depression-related indicators from interview transcripts, utilizing the Patient Health Questionnaire-8 (PHQ-8) score to train the prediction model. Additionally, facial data extracted from video frames is integrated with textual data to create a multimodal model for depression severity prediction. We evaluate three approaches: text-based features, facial features, and a combination of both. Our findings show the best results are achieved by enhancing text data with speech quality assessment, with a mean absolute error of 2.85 and root mean square error of 4.02. This study underscores the potential of automated depression detection, showing text-only models as robust and effective while paving the way for multimodal analysis.
引用
收藏
相关论文
共 50 条
  • [1] Understanding Naturalistic Facial Expressions with Deep Learning and Multimodal Large Language Models
    Bian, Yifan
    Kuester, Dennis
    Liu, Hui
    Krumhuber, Eva G.
    SENSORS, 2024, 24 (01)
  • [2] Contextual Object Detection with Multimodal Large Language Models
    Zang, Yuhang
    Li, Wei
    Han, Jun
    Zhou, Kaiyang
    Loy, Chen Change
    INTERNATIONAL JOURNAL OF COMPUTER VISION, 2024, : 825 - 843
  • [3] Detection of facial expressions of emotions in depression
    Suslow, T
    Junghanns, K
    Arolt, V
    PERCEPTUAL AND MOTOR SKILLS, 2001, 92 (03) : 857 - 868
  • [4] Harnessing the Power of Large Language Models
    Hofmann, Meike
    Burch, Gerald F.
    Burch, Jana J.
    ISACA Journal, 2024, 1 : 32 - 39
  • [5] Harnessing multimodal large language models for traffic knowledge graph generation and decision-making
    Kuang, Senyun
    Liu, Yang
    Wang, Xin
    Wu, Xinhua
    Wei, Yintao
    Communications in Transportation Research, 2024, 4
  • [6] Using Augmented Small Multimodal Models to Guide Large Language Models for Multimodal Relation Extraction
    He, Wentao
    Ma, Hanjie
    Li, Shaohua
    Dong, Hui
    Zhang, Haixiang
    Feng, Jie
    APPLIED SCIENCES-BASEL, 2023, 13 (22):
  • [7] Towards Emotion Detection in Educational Scenarios from Facial Expressions and Body Movements through Multimodal Approaches
    Saneiro, Mar
    Santos, Olga C.
    Salmeron-Majadas, Sergio
    Boticario, Jesus G.
    SCIENTIFIC WORLD JOURNAL, 2014,
  • [8] InteraRec: Interactive Recommendations Using Multimodal Large Language Models
    Karra, Saketh Reddy
    Tulabandhula, Theja
    TRENDS AND APPLICATIONS IN KNOWLEDGE DISCOVERY AND DATA MINING, PAKDD 2024 WORKSHOPS, RAFDA AND IWTA, 2024, 14658 : 32 - 43
  • [9] A survey on multimodal large language models
    Shukang Yin
    Chaoyou Fu
    Sirui Zhao
    Ke Li
    Xing Sun
    Tong Xu
    Enhong Chen
    National Science Review, 2024, 11 (12) : 277 - 296
  • [10] AUTOMATIC DEPRESSION DETECTION VIA FACIAL EXPRESSIONS USING MULTIPLE INSTANCE LEARNING
    Wang, Yanfei
    Ma, Jie
    Hao, Bibo
    Hu, Pengwei
    Wang, Xiaoqian
    Mei, Jing
    Li, Shaochun
    2020 IEEE 17TH INTERNATIONAL SYMPOSIUM ON BIOMEDICAL IMAGING (ISBI 2020), 2020, : 1933 - 1936