Multimodal Deep Learning Framework for Mental Disorder Recognition

被引:35
|
作者
Zhang, Ziheng [1 ,4 ]
Lin, Weizhe [2 ]
Liu, Mingyu [3 ]
Mahmoud, Marwa [1 ]
机构
[1] Univ Cambridge, Dept Comp Sci & Technol, Cambridge, England
[2] Univ Cambridge, Dept Engn, Cambridge, England
[3] Univ Oxford, Dept Phys, Oxford, England
[4] Tencent Jarvis Lab, Shenzhen, Peoples R China
关键词
D O I
10.1109/FG47880.2020.00033
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Current methods for mental disorder recognition mostly depend on clinical interviews and self-reported scores that can be highly subjective. Building an automatic recognition system can help in early detection of symptoms and providing insights into the biological markers for diagnosis. It is, however, a challenging task as it requires taking into account indicators from different modalities, such as facial expressions, gestures, acoustic features and verbal content. To address this issue, we propose a general-purpose multimodal deep learning framework, in which multiple modalities - including acoustic, visual and textual features - are processed individually with the cross-modality correlation considered. Specifically, a Multimodal Deep Denoising Autoencoder (multi-DDAE) is designed to obtain multimodal representations of audio-visual features followed by the Fisher Vector encoding which produces session-level descriptors. For textual modality, a Paragraph Vector (PV) is proposed to embed the transcripts of interview sessions into document representations capturing cues related to mental disorders. Following an early fusion strategy, both audio-visual and textual features are then fused prior to feeding them to a Multitask Deep Neural Network (DNN) as the final classifier. Our framework is evaluated on the automatic detection of two mental disorders: bipolar disorder (BD) and depression, using two datasets: Bipolar Disorder Corpus (BDC) and the Extended Distress Analysis Interview Corpus (E-DAIC), respectively. Our experimental evaluation results showed comparable performance to the state-of-the-art in BD and depression detection, thus demonstrating the effective multimodal representation learning and the capability to generalise across different mental disorders.
引用
收藏
页码:344 / 350
页数:7
相关论文
共 50 条
  • [1] A Deep Learning and Multimodal Ambient Sensing Framework for Human Activity Recognition
    Yachir, Ali
    Amamra, Abdenour
    Djamaa, Badis
    Zerrouki, Ali
    Amour, Ahmed KhierEddine
    PROCEEDINGS OF THE 2019 FEDERATED CONFERENCE ON COMPUTER SCIENCE AND INFORMATION SYSTEMS (FEDCSIS), 2019, : 101 - 105
  • [2] A multimodal deep learning framework using local feature representations for face recognition
    Al-Waisy, Alaa S.
    Qahwaji, Rami
    Ipson, Stanley
    Al-Fahdawi, Shumoos
    MACHINE VISION AND APPLICATIONS, 2018, 29 (01) : 35 - 54
  • [3] A multimodal deep learning framework using local feature representations for face recognition
    Alaa S. Al-Waisy
    Rami Qahwaji
    Stanley Ipson
    Shumoos Al-Fahdawi
    Machine Vision and Applications, 2018, 29 : 35 - 54
  • [4] Intelligent ADL Recognition via IoT-Based Multimodal Deep Learning Framework
    Javeed, Madiha
    Al Mudawi, Naif
    Alazeb, Abdulwahab
    Almakdi, Sultan
    Alotaibi, Saud S.
    Chelloug, Samia Allaoua
    Jalal, Ahmad
    SENSORS, 2023, 23 (18)
  • [5] Emotion Recognition Using Multimodal Deep Learning
    Liu, Wei
    Zheng, Wei-Long
    Lu, Bao-Liang
    NEURAL INFORMATION PROCESSING, ICONIP 2016, PT II, 2016, 9948 : 521 - 529
  • [6] Emotion Recognition on Multimodal with Deep Learning and Ensemble
    Dharma, David Adi
    Zahra, Amalia
    International Journal of Advanced Computer Science and Applications, 2022, 13 (12): : 656 - 663
  • [7] Emotion Recognition on Multimodal with Deep Learning and Ensemble
    Dharma, David Adi
    Zahra, Amalia
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2022, 13 (12) : 656 - 663
  • [8] A deep semantic framework for multimodal representation learning
    Wang, Cheng
    Yang, Haojin
    Meinel, Christoph
    MULTIMEDIA TOOLS AND APPLICATIONS, 2016, 75 (15) : 9255 - 9276
  • [9] A deep semantic framework for multimodal representation learning
    Cheng Wang
    Haojin Yang
    Christoph Meinel
    Multimedia Tools and Applications, 2016, 75 : 9255 - 9276
  • [10] Comparing Recognition Performance and Robustness of Multimodal Deep Learning Models for Multimodal Emotion Recognition
    Liu, Wei
    Qiu, Jie-Lin
    Zheng, Wei-Long
    Lu, Bao-Liang
    IEEE TRANSACTIONS ON COGNITIVE AND DEVELOPMENTAL SYSTEMS, 2022, 14 (02) : 715 - 729