The Detection of Depression Using Multimodal Models Based on Text and Voice Quality Features

被引:11
|
作者
Solieman, Hanadi [1 ]
Pustozerov, Evgenii A. [1 ,2 ]
机构
[1] St Petersburg Electrotech Univ LETI, St Petersburg, Russia
[2] Almazov Natl Med Res Ctr, St Petersburg, Russia
关键词
Depression; Deep Learning; text analysis; voice quality; semi-contextual; word-level; speaker-independent; DAICWOZ; CLASSIFICATION;
D O I
10.1109/ElConRus51938.2021.9396540
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
The article proves the concept that an automatic diagnosis of depression can be achieved using audio recordings of the individuals' voices. DAIC-WOZ database was used as a data source. Audio and textual data were preprocessed and converted to a set of optimized parameters for two models. Appropriate Deep Learning models to detect depression in the transcripts of the audio recordings and voice quality features, were utilized. We created a text analysis model on a word-level using Natural Language Processing (NLP) techniques, and a voice quality analysis model on tense to breathy dimension. The text analysis model made its best performance with an Fl-score equal to 0.8 (0.42) for non-depressed (depressed) individuals, while the voice quality model scored 0.76 (0.38). As a result, we had two models that would be implemented in a system for the diagnosis of depression.
引用
收藏
页码:1843 / 1848
页数:6
相关论文
共 50 条
  • [1] MULTIMODAL DEPRESSION CLASSIFICATION USING ARTICULATORY COORDINATION FEATURES AND HIERARCHICAL ATTENTION BASED TEXT EMBEDDINGS
    Seneviratne, Nadee
    Espy-Wilson, Carol
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 6252 - 6256
  • [2] A Multimodal Approach for Detection and Assessment of Depression Using Text, Audio and Video
    Zhang, Wei
    Mao, Kaining
    Chen, Jie
    PHENOMICS, 2024, 4 (3): : 234 - 249
  • [3] Multimodal Depression Severity Score Prediction Using Articulatory Coordination Features and Hierarchical Attention Based Text Embeddings
    Seneviratne, Nadee
    Espy-Wilson, Carol
    INTERSPEECH 2022, 2022, : 3353 - 3357
  • [4] Effectiveness of Voice Quality Features in Detecting Depression
    Afshan, Amber
    Guo, Jinxi
    Park, Soo Jin
    Ravi, Vijay
    Flint, Jonathan
    Alwan, Abeer
    19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 1676 - 1680
  • [5] Detection of Depression Severity in Social Media Text Using Transformer-Based Models
    Qasim, Amna
    Mehak, Gull
    Hussain, Nisar
    Gelbukh, Alexander
    Sidorov, Grigori
    Information (Switzerland), 2025, 16 (02)
  • [6] Adolescent Depression Detection Model Based on Multimodal Data of Interview Audio and Text
    Zhang, Lei
    Fan, Yuanxiao
    Jiang, Jingwen
    Li, Yuchen
    Zhang, Wei
    INTERNATIONAL JOURNAL OF NEURAL SYSTEMS, 2022, 32 (11)
  • [7] Multimodal Depression Detection Network Based on Emotional and Behavioral Features in Conversations
    Wang, Peng
    Yang, Biao
    Wang, Suhong
    Zhu, Xianlin
    Ni, Rongrong
    Yang, Changchun
    ARTIFICIAL INTELLIGENCE AND ROBOTICS, ISAIR 2023, 2024, 1998 : 463 - 474
  • [8] VOICE QUALITY FEATURES FOR REPLAY ATTACK DETECTION
    Woubie, Abraham
    Backstrom, Tom
    2022 30TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO 2022), 2022, : 384 - 388
  • [9] Multimodal Exemplar-based Voice Conversion using Lip Features in Noisy Environments
    Masaka, Kenta
    Aihara, Ryo
    Takiguchi, Tetsuya
    Ariki, Yasuo
    15TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2014), VOLS 1-4, 2014, : 1159 - 1163
  • [10] Depression detection in social media posts using transformer-based models and auxiliary features
    Kerasiotis, Marios
    Ilias, Loukas
    Askounis, Dimitris
    SOCIAL NETWORK ANALYSIS AND MINING, 2024, 14 (01)