The Detection of Depression Using Multimodal Models Based on Text and Voice Quality Features

被引:11
|
作者
Solieman, Hanadi [1 ]
Pustozerov, Evgenii A. [1 ,2 ]
机构
[1] St Petersburg Electrotech Univ LETI, St Petersburg, Russia
[2] Almazov Natl Med Res Ctr, St Petersburg, Russia
关键词
Depression; Deep Learning; text analysis; voice quality; semi-contextual; word-level; speaker-independent; DAICWOZ; CLASSIFICATION;
D O I
10.1109/ElConRus51938.2021.9396540
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
The article proves the concept that an automatic diagnosis of depression can be achieved using audio recordings of the individuals' voices. DAIC-WOZ database was used as a data source. Audio and textual data were preprocessed and converted to a set of optimized parameters for two models. Appropriate Deep Learning models to detect depression in the transcripts of the audio recordings and voice quality features, were utilized. We created a text analysis model on a word-level using Natural Language Processing (NLP) techniques, and a voice quality analysis model on tense to breathy dimension. The text analysis model made its best performance with an Fl-score equal to 0.8 (0.42) for non-depressed (depressed) individuals, while the voice quality model scored 0.76 (0.38). As a result, we had two models that would be implemented in a system for the diagnosis of depression.
引用
收藏
页码:1843 / 1848
页数:6
相关论文
共 50 条
  • [21] Multimodal Sensing for Depression Risk Detection: Integrating Audio, Video, and Text Data
    Zhang, Zhenwei
    Zhang, Shengming
    Ni, Dong
    Wei, Zhaoguo
    Yang, Kongjun
    Jin, Shan
    Huang, Gan
    Liang, Zhen
    Zhang, Li
    Li, Linling
    Ding, Huijun
    Zhang, Zhiguo
    Wang, Jianhong
    SENSORS, 2024, 24 (12)
  • [22] Text-Based Detection of the Risk of Depression
    Havigerova, Jana M.
    Haviger, Jiri
    Kucera, Dalibor
    Hoffmannova, Petra
    FRONTIERS IN PSYCHOLOGY, 2019, 10
  • [23] Multimodal and Multiresolution Depression Detection from Speech and Facial Landmark Features
    Nasir, Md
    Jati, Arindam
    Shivakumar, Prashanth Gurunath
    Chakravarthula, Sandeep Nallan
    Georgiou, Panayiotis
    PROCEEDINGS OF THE 6TH INTERNATIONAL WORKSHOP ON AUDIO/VISUAL EMOTION CHALLENGE (AVEC'16), 2016, : 43 - 50
  • [24] Voice activity detection based on conditional random fields using multiple features
    Saito, Akira
    Nankaku, Yoshihiko
    Lee, Akinobu
    Tokuda, Keiichi
    11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 3 AND 4, 2010, : 2086 - 2089
  • [25] Towards using Breathing Features for Multimodal Estimation of Depression Severity
    Pessanha, Francisca
    Kaya, Heysem
    Salah, Alkim Almila Akdag
    Salah, Albert Ali
    PROCEEDINGS OF THE 2022 INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION, ICMI 2022, 2022, : 128 - 138
  • [26] Using Voice Quality Features to Improve Short-Utterance, Text-Independent Speaker Verification Systems
    Park, Soo Jin
    Yeung, Gary
    Kreiman, Jody
    Keating, Patricia A.
    Alwan, Abeer
    18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 1522 - 1526
  • [27] Explainable Depression Detection using Multimodal Behavioural Cues
    Gahalawat, Monika
    PROCEEDINGS OF THE 25TH INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION, ICMI 2023, 2023, : 721 - 725
  • [28] Text Detection based on MSER and CNN Features
    Turki, Houssem
    Ben Halima, Mohamed
    Alimi, Adel M.
    2017 14TH IAPR INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION (ICDAR), VOL 1, 2017, : 949 - 954
  • [29] Face detection using multimodal density models
    Yang, MH
    Kriegman, D
    Ahuja, N
    COMPUTER VISION AND IMAGE UNDERSTANDING, 2001, 84 (02) : 264 - 284
  • [30] Scene Text Detection based on Structural Features
    Nguyen, Khanh
    Ngo Duc Thanh
    2016 INTERNATIONAL CONFERENCE ON COMPUTER, CONTROL, INFORMATICS, AND ITS APPLICATIONS (IC3INA) - RECENT PROGRESS IN COMPUTER, CONTROL, AND INFORMATICS FOR DATA SCIENCE, 2016, : 48 - 53