Multimodal Deep Learning Framework for Mental Disorder Recognition

被引:35
|
作者
Zhang, Ziheng [1 ,4 ]
Lin, Weizhe [2 ]
Liu, Mingyu [3 ]
Mahmoud, Marwa [1 ]
机构
[1] Univ Cambridge, Dept Comp Sci & Technol, Cambridge, England
[2] Univ Cambridge, Dept Engn, Cambridge, England
[3] Univ Oxford, Dept Phys, Oxford, England
[4] Tencent Jarvis Lab, Shenzhen, Peoples R China
关键词
D O I
10.1109/FG47880.2020.00033
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Current methods for mental disorder recognition mostly depend on clinical interviews and self-reported scores that can be highly subjective. Building an automatic recognition system can help in early detection of symptoms and providing insights into the biological markers for diagnosis. It is, however, a challenging task as it requires taking into account indicators from different modalities, such as facial expressions, gestures, acoustic features and verbal content. To address this issue, we propose a general-purpose multimodal deep learning framework, in which multiple modalities - including acoustic, visual and textual features - are processed individually with the cross-modality correlation considered. Specifically, a Multimodal Deep Denoising Autoencoder (multi-DDAE) is designed to obtain multimodal representations of audio-visual features followed by the Fisher Vector encoding which produces session-level descriptors. For textual modality, a Paragraph Vector (PV) is proposed to embed the transcripts of interview sessions into document representations capturing cues related to mental disorders. Following an early fusion strategy, both audio-visual and textual features are then fused prior to feeding them to a Multitask Deep Neural Network (DNN) as the final classifier. Our framework is evaluated on the automatic detection of two mental disorders: bipolar disorder (BD) and depression, using two datasets: Bipolar Disorder Corpus (BDC) and the Extended Distress Analysis Interview Corpus (E-DAIC), respectively. Our experimental evaluation results showed comparable performance to the state-of-the-art in BD and depression detection, thus demonstrating the effective multimodal representation learning and the capability to generalise across different mental disorders.
引用
收藏
页码:344 / 350
页数:7
相关论文
共 50 条
  • [21] Multimodal temporal machine learning for Bipolar Disorder and Depression Recognition
    Francesco Ceccarelli
    Marwa Mahmoud
    Pattern Analysis and Applications, 2022, 25 : 493 - 504
  • [22] Multimodal temporal machine learning for Bipolar Disorder and Depression Recognition
    Ceccarelli, Francesco
    Mahmoud, Marwa
    PATTERN ANALYSIS AND APPLICATIONS, 2022, 25 (03) : 493 - 504
  • [23] A scalable multimodal ensemble learning framework for automatic modulation recognition
    Shi J.
    Yue G.
    Ma S.
    Peng T.
    Ma B.
    International Journal of Wireless and Mobile Computing, 2024, 26 (02) : 182 - 197
  • [24] Correlation Net: Spatiotemporal multimodal deep learning for action recognition
    Yudistira, Novanto
    Kurita, Takio
    SIGNAL PROCESSING-IMAGE COMMUNICATION, 2020, 82
  • [25] Deep learning: from speech recognition to language and multimodal processing
    Deng, Li
    APSIPA TRANSACTIONS ON SIGNAL AND INFORMATION PROCESSING, 2016, 5
  • [26] Uncovering Human Multimodal Activity Recognition with a Deep Learning Approach
    Ranieri, Caetano M.
    Vargas, Patricia A.
    Romero, Roseli A. F.
    2020 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2020,
  • [27] Enhancing masked facial expression recognition with multimodal deep learning
    Shahzad, H. M.
    Bhatti, Sohail Masood
    Jaffar, Arfan
    Akram, Sheeraz
    MULTIMEDIA TOOLS AND APPLICATIONS, 2024, 83 (30) : 73911 - 73921
  • [28] EmoNets: Multimodal deep learning approaches for emotion recognition in video
    Kahou, Samira Ebrahimi
    Bouthillier, Xavier
    Lamblin, Pascal
    Gulcehre, Caglar
    Michalski, Vincent
    Konda, Kishore
    Jean, Sebastien
    Froumenty, Pierre
    Dauphin, Yann
    Boulanger-Lewandowski, Nicolas
    Ferrari, Raul Chandias
    Mirza, Mehdi
    Warde-Farley, David
    Courville, Aaron
    Vincent, Pascal
    Memisevic, Roland
    Pal, Christopher
    Bengio, Yoshua
    JOURNAL ON MULTIMODAL USER INTERFACES, 2016, 10 (02) : 99 - 111
  • [29] Graph to Grid: Learning Deep Representations for Multimodal Emotion Recognition
    Jin, Ming
    Li, Jinpeng
    PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 5985 - 5993
  • [30] DEEP MULTIMODAL LEARNING FOR AUDIO-VISUAL SPEECH RECOGNITION
    Mroueh, Youssef
    Marcheret, Etienne
    Goel, Vaibhava
    2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 2130 - 2134