Deep Multi-Modal Network Based Automated Depression Severity Estimation

被引:23
|
作者
Uddin, Md Azher [1 ]
Joolee, Joolekha Bibi [2 ]
Sohn, Kyung-Ah [3 ,4 ]
机构
[1] Heriot Watt Univ, Dept Comp Sci, Dubai Campus, Dubai 38103, U Arab Emirates
[2] Kyung Hee Univ, Dept Comp Sci & Engn, Global Campus, Yongin 17104, South Korea
[3] Ajou Univ, Dept Artificial Intelligence, Suwon 16499, South Korea
[4] Ajou Univ, Dept Software & Comp Engn, Suwon 16499, South Korea
基金
新加坡国家研究基金会;
关键词
Depression; Feature extraction; Three-dimensional displays; Convolutional neural networks; Optical flow; Long short term memory; Encoding; spatio-temporal networks; volume local directional structural pattern; temporal attentive pooling; multi-modal factorized bilinear pooling; FACIAL APPEARANCE; RECOGNITION;
D O I
10.1109/TAFFC.2022.3179478
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Depression is a severe mental illness that impairs a person's capacity to function normally in personal and professional life. The assessment of depression usually requires a comprehensive examination by an expert professional. Recently, machine learning-based automatic depression assessment has received considerable attention for a reliable and efficient depression diagnosis. Various techniques for automated depression detection were developed; however, certain concerns still need to be investigated. In this work, we propose a novel deep multi-modal framework that effectively utilizes facial and verbal cues for an automated depression assessment. Specifically, we first partition the audio and video data into fixed-length segments. Then, these segments are fed into the Spatio-Temporal Networks as input, which captures both spatial and temporal features as well as assigns higher weights to the features that contribute most. In addition, Volume Local Directional Structural Pattern (VLDSP) based dynamic feature descriptor is introduced to extract the facial dynamics by encoding the structural aspects. Afterwards, we employ the Temporal Attentive Pooling (TAP) approach to summarize the segment-level features for audio and video data. Finally, the multi-modal factorized bilinear pooling (MFB) strategy is applied to fuse the multi-modal features effectively. An extensive experimental study reveals that the proposed method outperforms state-of-the-art approaches.
引用
收藏
页码:2153 / 2167
页数:15
相关论文
共 50 条
  • [21] Deep Fusion for Multi-Modal 6D Pose Estimation
    Lin, Shifeng
    Wang, Zunran
    Zhang, Shenghao
    Ling, Yonggen
    Yang, Chenguang
    IEEE TRANSACTIONS ON AUTOMATION SCIENCE AND ENGINEERING, 2024, 21 (04) : 6540 - 6549
  • [22] Deterministic Uncertainty Estimation for Multi-Modal Regression With Deep Neural Networks
    Cho, Jaehak
    Kim, Jae Myung
    Han, Seungyub
    Lee, Jungwoo
    IEEE ACCESS, 2025, 13 : 45281 - 45289
  • [23] Blended Multi-Modal Deep ConvNet Features for Diabetic Retinopathy Severity Prediction
    Bodapati, Jyostna Devi
    Naralasetti, Veeranjaneyulu
    Shareef, Shaik Nagur
    Hakak, Saqib
    Bilal, Muhammad
    Maddikunta, Praveen Kumar Reddy
    Jo, Ohyun
    ELECTRONICS, 2020, 9 (06)
  • [24] Deep neural network for automated simultaneous intervertebral disc (IVDs) identification and segmentation of multi-modal MR images
    Das, Pabitra
    Pal, Chandrajit
    Acharyya, Amit
    Chakrabarti, Amlan
    Basu, Saumyajit
    COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE, 2021, 205
  • [25] OctopusNet: A Deep Learning Segmentation Network for Multi-modal Medical Images
    Chen, Yu
    Chen, Jiawei
    Wei, Dong
    Li, Yuexiang
    Zheng, Yefeng
    MULTISCALE MULTIMODAL MEDICAL IMAGING, MMMI 2019, 2020, 11977 : 17 - 25
  • [26] A Multi-modal Deep Neural Network Model for Forested Landslide Detection
    Tang, Xiaochuan
    Tu, Zihan
    Ren, Xuqing
    Fang, Chengyong
    Wang, Yu
    Liu, Xin
    Fan, Xuanmei
    Wuhan Daxue Xuebao (Xinxi Kexue Ban)/Geomatics and Information Science of Wuhan University, 2024, 49 (09): : 1566 - 1573
  • [27] Multi-modal vertebrae recognition using Transformed Deep Convolution Network
    Cai, Yunliang
    Landis, Mark
    Laidley, David T.
    Kornecki, Anat
    Lum, Andrea
    Li, Shuo
    COMPUTERIZED MEDICAL IMAGING AND GRAPHICS, 2016, 51 : 11 - 19
  • [28] UNIVERSAL MULTI-MODAL DEEP NETWORK FOR CLASSIFICATION AND SEGMENTATION OF MEDICAL IMAGES
    Harouni, Ahmed
    Karargyris, Alexandros
    Negahdar, Mohammadreza
    Beymer, David
    Syeda-Mahmood, Tanveer
    2018 IEEE 15TH INTERNATIONAL SYMPOSIUM ON BIOMEDICAL IMAGING (ISBI 2018), 2018, : 872 - 876
  • [29] Deep unsupervised multi-modal fusion network for detecting driver distraction
    Zhang, Yuxin
    Chen, Yiqiang
    Gao, Chenlong
    NEUROCOMPUTING, 2021, 421 : 26 - 38
  • [30] Deep unfolding network with spatial alignment for multi-modal MRI reconstruction
    Zhang, Hao
    Wang, Qi
    Shi, Jun
    Ying, Shihui
    Wen, Zhijie
    MEDICAL IMAGE ANALYSIS, 2025, 99