Multimodal Depression Detection: Fusion Analysis of Paralinguistic, Head Pose and Eye Gaze Behaviors

被引:92
|
作者
Alghowinem, Sharifa [1 ]
Goecke, Roland [2 ]
Wagner, Michael [2 ,3 ,4 ,5 ]
Epps, Julien [6 ]
Hyett, Matthew [6 ]
Parker, Gordon [6 ]
Breakspear, Michael [7 ,8 ]
机构
[1] Prince Sultan Univ, Riyadh 11586, Saudi Arabia
[2] Univ Canberra, Canberra, ACT 2617, Australia
[3] Australian Natl Univ, Canberra, ACT 0200, Australia
[4] Natl Ctr Biometric Studies Pty Ltd, Canberra, ACT 2600, Australia
[5] Tech Univ Berlin, D-10623 Berlin, Germany
[6] Univ New South Wales, Sydney, NSW 2052, Australia
[7] QIMR Berghofer Med Res Inst, Brisbane, Qld 400, Australia
[8] Metro North Mental Hlth Serv, Brisbane, Qld 4029, Australia
基金
澳大利亚研究理事会;
关键词
Depression detection; multimodal fusion; speaking behaviour; eye activity; head pose; AUDIO;
D O I
10.1109/TAFFC.2016.2634527
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
An estimated 350 million people worldwide are affected by depression. Using affective sensing technology, our long-term goal is to develop an objective multimodal system that augments clinical opinion during the diagnosis and monitoring of clinical depression. This paper steps towards developing a classification system-oriented approach, where feature selection, classification and fusion-based experiments are conducted to infer which types of behaviour (verbal and nonverbal) and behaviour combinations can best discriminate between depression and non-depression. Using statistical features extracted from speaking behaviour, eye activity, and head pose, we characterise the behaviour associated with major depression and examine the performance of the classification of individual modalities and when fused. Using a real-world, clinically validated dataset of 30 severely depressed patients and 30 healthy control subjects, a Support Vector Machine is used for classification with several feature selection techniques. Given the statistical nature of the extracted features, feature selection based on T-tests performed better than other methods. Individual modality classification results were considerably higher than chance level (83 percent for speech, 73 percent for eye, and 63 percent for head). Fusing all modalities shows a remarkable improvement compared to unimodal systems, which demonstrates the complementary nature of the modalities. Among the different fusion approaches used here, feature fusion performed best with up to 88 percent average accuracy. We believe that is due to the compatible nature of the extracted statistical features.
引用
收藏
页码:478 / 490
页数:13
相关论文
共 50 条
  • [1] Multimodal Depression Detection: Fusion of Electroencephalography and Paralinguistic Behaviors Using a Novel Strategy for Classifier Ensemble
    Zhang, Xiaowei
    Hu, Bin
    Shen, Jian
    Din, Zia Ud
    Liu, Jinyong
    Wang, Gang
    [J]. IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, 2019, 23 (06) : 2265 - 2275
  • [2] "Owl' and "Lizard': patterns of head pose and eye pose in driver gaze classification
    Fridman, Lex
    Lee, Joonbum
    Reimer, Bryan
    Victor, Trent
    [J]. IET COMPUTER VISION, 2016, 10 (04) : 308 - 314
  • [3] Combining Head Pose and Eye Location Information for Gaze Estimation
    Valenti, Roberto
    Sebe, Nicu
    Gevers, Theo
    [J]. IEEE TRANSACTIONS ON IMAGE PROCESSING, 2012, 21 (02) : 802 - 815
  • [4] An eye model for uncalibrated eye gaze estimation under variable head pose
    Hnatow, Justin
    Savakis, Andreas
    [J]. BIOMETRIC TECHNOLOGY FOR HUMAN IDENTIFICATION IV, 2007, 6539
  • [5] Deep Head Pose: Gaze-Direction Estimation in Multimodal Video
    Mukherjee, Sankha S.
    Robertson, Neil Martin
    [J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2015, 17 (11) : 2094 - 2107
  • [6] Driver Gaze Zone Estimation via Head Pose Fusion Assisted Supervision and Eye Region Weighted Encoding
    Yang, Yirong
    Liu, Chunsheng
    Chang, Faliang
    Lu, Yansha
    Liu, Hui
    [J]. IEEE TRANSACTIONS ON CONSUMER ELECTRONICS, 2021, 67 (04) : 275 - 284
  • [7] Gaze Tracking by Joint Head and Eye Pose Estimation Under Free Head Movement
    Cristina, Stefania
    Camilleri, Kenneth P.
    [J]. 2019 27TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2019,
  • [8] Human Computer Interaction with Head Pose, Eye Gaze and Body Gestures
    Wang, Kang
    Zhao, Rui
    Ji, Qiang
    [J]. PROCEEDINGS 2018 13TH IEEE INTERNATIONAL CONFERENCE ON AUTOMATIC FACE & GESTURE RECOGNITION (FG 2018), 2018, : 789 - 789
  • [9] Gaze Detection Based on Head Pose Estimation in Smart TV
    Dat Tien Nguyen
    Shin, Kwang Yong
    Lee, Won Oh
    Kim, Yeong Gon
    Kim, Ki Wan
    Hong, Hyung Gil
    Park, Kang Ryoung
    Oh, CheonIn
    Lee, HanKyu
    Jeong, Youngho
    [J]. 2013 INTERNATIONAL CONFERENCE ON ICT CONVERGENCE (ICTC 2013): FUTURE CREATIVE CONVERGENCE TECHNOLOGIES FOR NEW ICT ECOSYSTEMS, 2013, : 283 - 288
  • [10] Multi-Level Drowsiness Detection Based on Deep Feature Fusion of Eye and Head Pose
    Ye, Fang
    Li, Shunxin
    Yuan, Xin
    Li, Longfei
    [J]. PROCEEDINGS OF THE 2021 IEEE INTERNATIONAL CONFERENCE ON PROGRESS IN INFORMATICS AND COMPUTING (PIC), 2021, : 107 - 111