Deep Multi-Modal Network Based Automated Depression Severity Estimation

被引:23
|
作者
Uddin, Md Azher [1 ]
Joolee, Joolekha Bibi [2 ]
Sohn, Kyung-Ah [3 ,4 ]
机构
[1] Heriot Watt Univ, Dept Comp Sci, Dubai Campus, Dubai 38103, U Arab Emirates
[2] Kyung Hee Univ, Dept Comp Sci & Engn, Global Campus, Yongin 17104, South Korea
[3] Ajou Univ, Dept Artificial Intelligence, Suwon 16499, South Korea
[4] Ajou Univ, Dept Software & Comp Engn, Suwon 16499, South Korea
基金
新加坡国家研究基金会;
关键词
Depression; Feature extraction; Three-dimensional displays; Convolutional neural networks; Optical flow; Long short term memory; Encoding; spatio-temporal networks; volume local directional structural pattern; temporal attentive pooling; multi-modal factorized bilinear pooling; FACIAL APPEARANCE; RECOGNITION;
D O I
10.1109/TAFFC.2022.3179478
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Depression is a severe mental illness that impairs a person's capacity to function normally in personal and professional life. The assessment of depression usually requires a comprehensive examination by an expert professional. Recently, machine learning-based automatic depression assessment has received considerable attention for a reliable and efficient depression diagnosis. Various techniques for automated depression detection were developed; however, certain concerns still need to be investigated. In this work, we propose a novel deep multi-modal framework that effectively utilizes facial and verbal cues for an automated depression assessment. Specifically, we first partition the audio and video data into fixed-length segments. Then, these segments are fed into the Spatio-Temporal Networks as input, which captures both spatial and temporal features as well as assigns higher weights to the features that contribute most. In addition, Volume Local Directional Structural Pattern (VLDSP) based dynamic feature descriptor is introduced to extract the facial dynamics by encoding the structural aspects. Afterwards, we employ the Temporal Attentive Pooling (TAP) approach to summarize the segment-level features for audio and video data. Finally, the multi-modal factorized bilinear pooling (MFB) strategy is applied to fuse the multi-modal features effectively. An extensive experimental study reveals that the proposed method outperforms state-of-the-art approaches.
引用
收藏
页码:2153 / 2167
页数:15
相关论文
共 50 条
  • [1] Multi-Modal Depression Detection and Estimation
    Yang, Le
    2019 8TH INTERNATIONAL CONFERENCE ON AFFECTIVE COMPUTING AND INTELLIGENT INTERACTION WORKSHOPS AND DEMOS (ACIIW), 2019, : 26 - 30
  • [2] Multi-Modal Adaptive Fusion Transformer Network for the Estimation of Depression Level
    Sun, Hao
    Liu, Jiaqing
    Chai, Shurong
    Qiu, Zhaolin
    Lin, Lanfen
    Huang, Xinyin
    Chen, Yenwei
    SENSORS, 2021, 21 (14)
  • [3] Channel Estimation Algorithm Based on Multi-modal Neural Network
    Xue, Wenli
    Zhu, Hongwei
    Nian, Zhongyuan
    Wu, Xueyang
    Cui, Mingshi
    Mu, Chunfang
    Yang, Weiming
    Chen, Zhigang
    2024 9TH INTERNATIONAL CONFERENCE ON SIGNAL AND IMAGE PROCESSING, ICSIP, 2024, : 206 - 210
  • [4] Deep Robust Unsupervised Multi-Modal Network
    Yang, Yang
    Wu, Yi-Feng
    Zhan, De-Chuan
    Liu, Zhi-Bin
    Jiang, Yuan
    THIRTY-THIRD AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FIRST INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / NINTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2019, : 5652 - 5659
  • [5] Predicting Depression Severity by Multi-Modal Feature Engineering and Fusion
    Samareh, Aven
    Jin, Yan
    Wang, Zhangyang
    Chang, Xiangyu
    Huang, Shuai
    THIRTY-SECOND AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTIETH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / EIGHTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2018, : 8147 - 8148
  • [6] Automated diagnosis of breast cancer using multi-modal datasets: A deep convolution neural network based approach
    Muduli, Debendra
    Dash, Ratnakar
    Majhi, Banshidhar
    BIOMEDICAL SIGNAL PROCESSING AND CONTROL, 2022, 71
  • [7] Automated diagnosis of breast cancer using multi-modal datasets: A deep convolution neural network based approach
    Muduli, Debendra
    Dash, Ratnakar
    Majhi, Banshidhar
    Biomedical Signal Processing and Control, 2022, 71
  • [8] Multi-modal deep learning for automated assembly of periapical radiographs
    Pfaender, L.
    Schneider, L.
    Buettner, M.
    Krois, J.
    Meyer-Lueckel, H.
    Schwendicke, F.
    JOURNAL OF DENTISTRY, 2023, 135
  • [9] MSAFusionNet: Multiple Subspace Attention Based Deep Multi-modal Fusion Network
    Zhang, Sen
    Zhang, Changzheng
    Wang, Lanjun
    Li, Cixing
    Tu, Dandan
    Luo, Rui
    Qi, Guojun
    Luo, Jiebo
    MACHINE LEARNING IN MEDICAL IMAGING (MLMI 2019), 2019, 11861 : 54 - 62
  • [10] Robust Deep Multi-modal Learning Based on Gated Information Fusion Network
    Kim, Jaekyum
    Koh, Junho
    Kim, Yecheol
    Choi, Jaehyung
    Hwang, Youngbae
    Choi, Jun Won
    COMPUTER VISION - ACCV 2018, PT IV, 2019, 11364 : 90 - 106