Deep Multi-Modal Network Based Automated Depression Severity Estimation

被引:23
|
作者
Uddin, Md Azher [1 ]
Joolee, Joolekha Bibi [2 ]
Sohn, Kyung-Ah [3 ,4 ]
机构
[1] Heriot Watt Univ, Dept Comp Sci, Dubai Campus, Dubai 38103, U Arab Emirates
[2] Kyung Hee Univ, Dept Comp Sci & Engn, Global Campus, Yongin 17104, South Korea
[3] Ajou Univ, Dept Artificial Intelligence, Suwon 16499, South Korea
[4] Ajou Univ, Dept Software & Comp Engn, Suwon 16499, South Korea
基金
新加坡国家研究基金会;
关键词
Depression; Feature extraction; Three-dimensional displays; Convolutional neural networks; Optical flow; Long short term memory; Encoding; spatio-temporal networks; volume local directional structural pattern; temporal attentive pooling; multi-modal factorized bilinear pooling; FACIAL APPEARANCE; RECOGNITION;
D O I
10.1109/TAFFC.2022.3179478
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Depression is a severe mental illness that impairs a person's capacity to function normally in personal and professional life. The assessment of depression usually requires a comprehensive examination by an expert professional. Recently, machine learning-based automatic depression assessment has received considerable attention for a reliable and efficient depression diagnosis. Various techniques for automated depression detection were developed; however, certain concerns still need to be investigated. In this work, we propose a novel deep multi-modal framework that effectively utilizes facial and verbal cues for an automated depression assessment. Specifically, we first partition the audio and video data into fixed-length segments. Then, these segments are fed into the Spatio-Temporal Networks as input, which captures both spatial and temporal features as well as assigns higher weights to the features that contribute most. In addition, Volume Local Directional Structural Pattern (VLDSP) based dynamic feature descriptor is introduced to extract the facial dynamics by encoding the structural aspects. Afterwards, we employ the Temporal Attentive Pooling (TAP) approach to summarize the segment-level features for audio and video data. Finally, the multi-modal factorized bilinear pooling (MFB) strategy is applied to fuse the multi-modal features effectively. An extensive experimental study reveals that the proposed method outperforms state-of-the-art approaches.
引用
收藏
页码:2153 / 2167
页数:15
相关论文
共 50 条
  • [41] Multi-modal haptic image recognition based on deep learning
    Han, Dong
    Nie, Hong
    Chen, Jinbao
    Chen, Meng
    Deng, Zhen
    Zhang, Jianwei
    SENSOR REVIEW, 2018, 38 (04) : 486 - 493
  • [42] Effective deep learning-based multi-modal retrieval
    Wei Wang
    Xiaoyan Yang
    Beng Chin Ooi
    Dongxiang Zhang
    Yueting Zhuang
    The VLDB Journal, 2016, 25 : 79 - 101
  • [43] Deep Learning Based Multi-modal Registration for Retinal Imaging
    Arikan, Mustafa
    Sadeghipour, Amir
    Gerendas, Bianca
    Told, Reinhard
    Schmidt-Erfurt, Ursula
    INTERPRETABILITY OF MACHINE INTELLIGENCE IN MEDICAL IMAGE COMPUTING AND MULTIMODAL LEARNING FOR CLINICAL DECISION SUPPORT, 2020, 11797 : 75 - 82
  • [44] A Transformer-based multi-modal fusion network for 6D pose estimation
    Hong, Jia-Xin
    Zhang, Hong-Bo
    Liu, Jing-Hua
    Lei, Qing
    Yang, Li-Jie
    Du, Ji-Xiang
    INFORMATION FUSION, 2024, 105
  • [45] Multi-Modal Pedestrian Detection Algorithm Based on Deep Learning
    Li X.
    Fu H.
    Niu W.
    Wang P.
    Lü Z.
    Wang W.
    Hsi-An Chiao Tung Ta Hsueh/Journal of Xi'an Jiaotong University, 2022, 56 (10): : 61 - 70
  • [46] MMAP: A Multi-Modal Automated Online Proctor
    Gadekar, Aumkar
    Oak, Shreya
    Revadekar, Abhishek
    Nimkar, Anant V.
    MACHINE LEARNING AND BIG DATA ANALYTICS (PROCEEDINGS OF INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND BIG DATA ANALYTICS (ICMLBDA) 2021), 2022, 256 : 314 - 325
  • [47] A multi-modal deep neural network for multi-class liver cancer diagnosis
    Khan, Rayyan Azam
    Fu, Minghan
    Burbridge, Brent
    Luo, Yigang
    Wu, Fang-Xiang
    NEURAL NETWORKS, 2023, 165 : 553 - 561
  • [48] Multi-source and Multi-modal Deep Network Embedding for Cross-network Node Classification
    Yang, Hongwei
    He, Hui
    Zhang, Weizhe
    Wang, Yan
    Jing, Lin
    ACM TRANSACTIONS ON KNOWLEDGE DISCOVERY FROM DATA, 2024, 18 (06)
  • [49] The Stability of Multi-modal Traffic Network
    Han Ling-Hui
    Sun Hui-Jun
    Zhu Cheng-Juan
    Wu Jian-Jun
    Jia Bin
    COMMUNICATIONS IN THEORETICAL PHYSICS, 2013, 60 (01) : 48 - 54
  • [50] The Stability of Multi-modal Traffic Network
    韩凌辉
    孙会君
    朱成娟
    吴建军
    贾斌
    Communications in Theoretical Physics, 2013, 60 (07) : 48 - 54