A Multimodal Approach for Detection and Assessment of Depression Using Text, Audio and Video

被引:0
|
作者
Zhang, Wei [1 ]
Mao, Kaining [1 ]
Chen, Jie [1 ,2 ]
机构
[1] Univ Alberta, Dept Elect & Comp Engn, Edmonton, AB T6G 2R3, Canada
[2] Fudan Univ, Acad Engn & Technol, Shanghai 200433, Peoples R China
来源
PHENOMICS | 2024年 / 4卷 / 3期
关键词
Automatic depression detection; Natural language processing; Machine learning; Deep learning; SUICIDE;
D O I
10.1007/s43657-023-00152-8
中图分类号
Q3 [遗传学];
学科分类号
071007 ; 090102 ;
摘要
Depression is one of the most common mental disorders, and rates of depression in individuals increase each year. Traditional diagnostic methods are primarily based on professional judgment, which is prone to individual bias. Therefore, it is crucial to design an effective and robust diagnostic method for automated depression detection. Current artificial intelligence approaches are limited in their abilities to extract features from long sentences. In addition, current models are not as robust with large input dimensions. To solve these concerns, a multimodal fusion model comprised of text, audio, and video for both depression detection and assessment tasks was developed. In the text modality, pre-trained sentence embedding was utilized to extract semantic representation along with Bidirectional long short-term memory (BiLSTM) to predict depression. This study also used Principal component analysis (PCA) to reduce the dimensionality of the input feature space and Support vector machine (SVM) to predict depression based on audio modality. In the video modality, Extreme gradient boosting (XGBoost) was employed to conduct both feature selection and depression detection. The final predictions were given by outputs of the different modalities with an ensemble voting algorithm. Experiments on the Distress analysis interview corpus wizard-of-Oz (DAIC-WOZ) dataset showed a great improvement of performance, with a weighted F1 score of 0.85, a Root mean square error (RMSE) of 5.57, and a Mean absolute error (MAE) of 4.48. Our proposed model outperforms the baseline in both depression detection and assessment tasks, and was shown to perform better than other existing state-of-the-art depression detection methods.
引用
收藏
页码:234 / 249
页数:16
相关论文
共 50 条
  • [1] Multimodal Sensing for Depression Risk Detection: Integrating Audio, Video, and Text Data
    Zhang, Zhenwei
    Zhang, Shengming
    Ni, Dong
    Wei, Zhaoguo
    Yang, Kongjun
    Jin, Shan
    Huang, Gan
    Liang, Zhen
    Zhang, Li
    Li, Linling
    Ding, Huijun
    Zhang, Zhiguo
    Wang, Jianhong
    SENSORS, 2024, 24 (12)
  • [2] Adolescent Depression Detection Model Based on Multimodal Data of Interview Audio and Text
    Zhang, Lei
    Fan, Yuanxiao
    Jiang, Jingwen
    Li, Yuchen
    Zhang, Wei
    INTERNATIONAL JOURNAL OF NEURAL SYSTEMS, 2022, 32 (11)
  • [3] Multimodal Sentiment Analysis using Audio and Text for Crime Detection
    Boukabous, Mohammed
    Azizi, Mostafa
    2022 2ND INTERNATIONAL CONFERENCE ON INNOVATIVE RESEARCH IN APPLIED SCIENCE, ENGINEERING AND TECHNOLOGY (IRASET'2022), 2022, : 803 - 807
  • [4] Depression Assessment by Fusing High and Low Level Features from Audio, Video, and Text
    Pampouchidou, Anastasia
    Pediaditis, Matthew
    Giannakakis, Georgios
    Marias, Kostas
    Simantiraki, Olympia
    Manousos, Dimitrios
    Meriaudeau, Fabrice
    Yang, Fan
    Fazlollahi, Amir
    Roniotis, Alexandros
    Simos, Panagiotis
    Tsiknakis, Manolis
    PROCEEDINGS OF THE 6TH INTERNATIONAL WORKSHOP ON AUDIO/VISUAL EMOTION CHALLENGE (AVEC'16), 2016, : 27 - 34
  • [5] Enhancing depression detection: A multimodal approach with text extension and content fusion
    Chen, Jinyan
    Liu, Shuxian
    Xu, Meijia
    Wang, Peicheng
    EXPERT SYSTEMS, 2024,
  • [6] VIDEO EVENT DETECTION AND SUMMARIZATION USING AUDIO, VISUAL AND TEXT SALIENCY
    Evangelopoulos, G.
    Zlatintsi, A.
    Skoumas, G.
    Rapantzikos, K.
    Potamianos, A.
    Maragos, P.
    Avrithis, Y.
    2009 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1- 8, PROCEEDINGS, 2009, : 3553 - +
  • [7] MDD: A Unified Multimodal Deep Learning Approach for Depression Diagnosis Based on Text and Audio Speech
    Mohammad, Farah
    Al Mansoor, Khulood Mohammed
    Computers, Materials and Continua, 2024, 81 (03): : 4125 - 4147
  • [8] MUGEN: A Playground for Video-Audio-Text Multimodal Understanding and GENeration
    Hayes, Thomas
    Zhang, Songyang
    Yin, Xi
    Pang, Guan
    Sheng, Sasha
    Yang, Harry
    Ge, Songwei
    Hu, Qiyuan
    Parikh, Devi
    COMPUTER VISION, ECCV 2022, PT VIII, 2022, 13668 : 431 - 449
  • [9] A robust video text detection approach using SVM
    Wei, Yi Cheng
    Lin, Chang Hong
    EXPERT SYSTEMS WITH APPLICATIONS, 2012, 39 (12) : 10832 - 10840
  • [10] MULTIMODAL SPEECH EMOTION RECOGNITION USING AUDIO AND TEXT
    Yoon, Seunghyun
    Byun, Seokhyun
    Jung, Kyomin
    2018 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2018), 2018, : 112 - 118