The Detection of Depression Using Multimodal Models Based on Text and Voice Quality Features

被引:11
|
作者
Solieman, Hanadi [1 ]
Pustozerov, Evgenii A. [1 ,2 ]
机构
[1] St Petersburg Electrotech Univ LETI, St Petersburg, Russia
[2] Almazov Natl Med Res Ctr, St Petersburg, Russia
关键词
Depression; Deep Learning; text analysis; voice quality; semi-contextual; word-level; speaker-independent; DAICWOZ; CLASSIFICATION;
D O I
10.1109/ElConRus51938.2021.9396540
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
The article proves the concept that an automatic diagnosis of depression can be achieved using audio recordings of the individuals' voices. DAIC-WOZ database was used as a data source. Audio and textual data were preprocessed and converted to a set of optimized parameters for two models. Appropriate Deep Learning models to detect depression in the transcripts of the audio recordings and voice quality features, were utilized. We created a text analysis model on a word-level using Natural Language Processing (NLP) techniques, and a voice quality analysis model on tense to breathy dimension. The text analysis model made its best performance with an Fl-score equal to 0.8 (0.42) for non-depressed (depressed) individuals, while the voice quality model scored 0.76 (0.38). As a result, we had two models that would be implemented in a system for the diagnosis of depression.
引用
收藏
页码:1843 / 1848
页数:6
相关论文
共 50 条
  • [31] Multimodal Sentiment Analysis using Audio and Text for Crime Detection
    Boukabous, Mohammed
    Azizi, Mostafa
    2022 2ND INTERNATIONAL CONFERENCE ON INNOVATIVE RESEARCH IN APPLIED SCIENCE, ENGINEERING AND TECHNOLOGY (IRASET'2022), 2022, : 803 - 807
  • [32] Snooker Video Event Detection Using Multimodal Features
    Yu, Junqing
    Huang, Yixin
    He, Yunfeng
    PROCEEDINGS OF THE 1ST INTERNATIONAL WORKSHOP ON MULTIMEDIA CONTENT ANALYSIS IN SPORTS (MMSPORTS'18), 2018, : 3 - 10
  • [33] Automatic Evaluation of Voice Quality Using Text-Based Laryngograph Measurements and Prosodic Analysis
    Haderlein, Tino
    Schwemmle, Cornelia
    Doellinger, Michael
    Matousek, Vaclav
    Ptok, Martin
    Noeth, Elmar
    COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE, 2015, 2015
  • [34] Multimodal Depression Detection Using Task-oriented Transformer-based Embedding
    Rasipuram, Sowmya
    Bhat, Junaid Hamid
    Maitra, Anutosh
    Shaw, Bishal
    Saha, Sriparna
    2022 27TH IEEE SYMPOSIUM ON COMPUTERS AND COMMUNICATIONS (IEEE ISCC 2022), 2022,
  • [35] Text-Independent Voice Conversion Using Deep Neural Network Based Phonetic Level Features
    Zheng, Huadi
    Cai, Weicheng
    Zhou, Tianyan
    Zhang, Shilei
    Li, Ming
    2016 23RD INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2016, : 2872 - 2877
  • [36] Robust Outdoor Text Detection Using Text Intensity and Shape Features
    Liu, Zongyi
    Sarkar, Sudeep
    19TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION, VOLS 1-6, 2008, : 1130 - +
  • [37] Fake news detection using enhanced features through text to image transformation with customized models
    Furqan Rustam
    Wajdi aljedaani
    Anca Delia Jurcut
    Sultan Alfarhood
    Mejdl Safran
    Imran Ashraf
    Discover Computing, 27 (1)
  • [38] A Text Independent Handwriting Forgery Detection System Based on Branchlet Features and Gaussian Mixture Models
    Fahn, Chin-Shyurng
    Lee, Chu-Ping
    Chen, Heng-I
    2016 14TH ANNUAL CONFERENCE ON PRIVACY, SECURITY AND TRUST (PST), 2016,
  • [39] Voice Liveness Detection using Constant-Q Transform-Based Features
    Patil, Ankur T.
    Khoria, Kuldeep
    Patil, Hemant A.
    2022 30TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO 2022), 2022, : 110 - 114
  • [40] Fast Boosting based Detection using Scale Invariant Multimodal Multiresolution Filtered Features
    Costea, Arthur Daniel
    Varga, Robert
    Nedevschi, Sergiu
    30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 993 - 1002