Deep Multi-Modal Network Based Automated Depression Severity Estimation

被引:23
|
作者
Uddin, Md Azher [1 ]
Joolee, Joolekha Bibi [2 ]
Sohn, Kyung-Ah [3 ,4 ]
机构
[1] Heriot Watt Univ, Dept Comp Sci, Dubai Campus, Dubai 38103, U Arab Emirates
[2] Kyung Hee Univ, Dept Comp Sci & Engn, Global Campus, Yongin 17104, South Korea
[3] Ajou Univ, Dept Artificial Intelligence, Suwon 16499, South Korea
[4] Ajou Univ, Dept Software & Comp Engn, Suwon 16499, South Korea
基金
新加坡国家研究基金会;
关键词
Depression; Feature extraction; Three-dimensional displays; Convolutional neural networks; Optical flow; Long short term memory; Encoding; spatio-temporal networks; volume local directional structural pattern; temporal attentive pooling; multi-modal factorized bilinear pooling; FACIAL APPEARANCE; RECOGNITION;
D O I
10.1109/TAFFC.2022.3179478
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Depression is a severe mental illness that impairs a person's capacity to function normally in personal and professional life. The assessment of depression usually requires a comprehensive examination by an expert professional. Recently, machine learning-based automatic depression assessment has received considerable attention for a reliable and efficient depression diagnosis. Various techniques for automated depression detection were developed; however, certain concerns still need to be investigated. In this work, we propose a novel deep multi-modal framework that effectively utilizes facial and verbal cues for an automated depression assessment. Specifically, we first partition the audio and video data into fixed-length segments. Then, these segments are fed into the Spatio-Temporal Networks as input, which captures both spatial and temporal features as well as assigns higher weights to the features that contribute most. In addition, Volume Local Directional Structural Pattern (VLDSP) based dynamic feature descriptor is introduced to extract the facial dynamics by encoding the structural aspects. Afterwards, we employ the Temporal Attentive Pooling (TAP) approach to summarize the segment-level features for audio and video data. Finally, the multi-modal factorized bilinear pooling (MFB) strategy is applied to fuse the multi-modal features effectively. An extensive experimental study reveals that the proposed method outperforms state-of-the-art approaches.
引用
收藏
页码:2153 / 2167
页数:15
相关论文
共 50 条
  • [31] Deep Convolutional Neural Network for Multi-Modal Image Restoration and Fusion
    Deng, Xin
    Dragotti, Pier Luigi
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2021, 43 (10) : 3333 - 3348
  • [32] DeepMIN: Deep Multi-modal Interest Network with Cognitive Learning Modules
    Zhang, Zhaoxiang
    Li, Zhiheng
    Jin, Jipeng
    Gao, Xiaofeng
    Yang, Xiongwen
    Zhang, Bo
    Xiao, Lei
    DATABASE SYSTEMS FOR ADVANCED APPLICATIONS, DASFAA 2024, PT 3, 2025, 14852 : 212 - 227
  • [33] Multi-modal deep network for RGB-D segmentation of clothes
    Joukovsky, B.
    Hu, P.
    Munteanu, A.
    ELECTRONICS LETTERS, 2020, 56 (09) : 432 - 434
  • [34] Deep unsupervised multi-modal fusion network for detecting driver distraction
    Zhang Y.
    Chen Y.
    Gao C.
    Neurocomputing, 2021, 421 : 26 - 38
  • [35] Hierarchical deep multi-modal network for medical visual question answering
    Gupta D.
    Suman S.
    Ekbal A.
    Expert Systems with Applications, 2021, 164
  • [36] A Novel Cross Modal Hashing Algorithm Based on Multi-modal Deep Learning
    Qu, Wen
    Wang, Daling
    Feng, Shi
    Zhang, Yifei
    Yu, Ge
    SOCIAL MEDIA PROCESSING, SMP 2015, 2015, 568 : 156 - 167
  • [37] Multi-Modal Deep Analysis for Multimedia
    Zhu, Wenwu
    Wang, Xin
    Li, Hongzhi
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2020, 30 (10) : 3740 - 3764
  • [38] Reinforcement Learning-Based Resource Allocation for Streaming in a Multi-Modal Deep Space Network
    Ha, Taeyun
    Oh, Junsuk
    Lee, Donghyun
    Lee, Jeonghwa
    Jeon, Yongin
    Cho, Sungrae
    12TH INTERNATIONAL CONFERENCE ON ICT CONVERGENCE (ICTC 2021): BEYOND THE PANDEMIC ERA WITH ICT CONVERGENCE INNOVATION, 2021, : 201 - 206
  • [39] An Efficient Acute Lymphoblastic Leukemia Screen Framework Based on Multi-Modal Deep Neural Network
    Wang, Qiuming
    Huang, Tao
    Luo, Xiaojuan
    Luo, Xiaoling
    Li, Xuechen
    Cao, Ke
    Li, Defa
    Shen, Linlin
    INTERNATIONAL JOURNAL OF LABORATORY HEMATOLOGY, 2025,
  • [40] Effective deep learning-based multi-modal retrieval
    Wang, Wei
    Yang, Xiaoyan
    Ooi, Beng Chin
    Zhang, Dongxiang
    Zhuang, Yueting
    VLDB JOURNAL, 2016, 25 (01): : 79 - 101