Improving speech depression detection using transfer learning with wav2vec 2.0 in low-resource environments

被引:0
|
作者
Zhang, Xu [1 ]
Zhang, Xiangcheng [2 ]
Chen, Weisi [1 ]
Li, Chenlong [2 ]
Yu, Chengyuan [3 ]
机构
[1] Xiamen Univ Technol, Sch Software Engn, Xiamen 361024, Peoples R China
[2] Xiamen Univ Technol, Sch Comp & Informat Engn, Xiamen 361024, Peoples R China
[3] Jiangxi Agr Univ, Sch Comp & Informat Engn, Nanchang 330045, Peoples R China
来源
SCIENTIFIC REPORTS | 2024年 / 14卷 / 01期
关键词
TIME; NETWORK;
D O I
10.1038/s41598-024-60278-1
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Depression, a pervasive global mental disorder, profoundly impacts daily lives. Despite numerous deep learning studies focused on depression detection through speech analysis, the shortage of annotated bulk samples hampers the development of effective models. In response to this challenge, our research introduces a transfer learning approach for detecting depression in speech, aiming to overcome constraints imposed by limited resources. In the context of feature representation, we obtain depression-related features by fine-tuning wav2vec 2.0. By integrating 1D-CNN and attention pooling structures, we generate advanced features at the segment level, thereby enhancing the model's capability to capture temporal relationships within audio frames. In the realm of prediction results, we integrate LSTM and self-attention mechanisms. This incorporation assigns greater weights to segments associated with depression, thereby augmenting the model's discernment of depression-related information. The experimental results indicate that our model has achieved impressive F1 scores, reaching 79% on the DAIC-WOZ dataset and 90.53% on the CMDC dataset. It outperforms recent baseline models in the field of speech-based depression detection. This provides a promising solution for effective depression detection in low-resource environments.
引用
收藏
页数:14
相关论文
共 50 条
  • [1] BrainTalker: Low-Resource Brain-to-Speech Synthesis with Transfer Learning using Wav2Vec 2.0
    Kim, Miseul
    Piao, Zhenyu
    Lee, Jihyun
    Kang, Hong-Goo
    2023 IEEE EMBS INTERNATIONAL CONFERENCE ON BIOMEDICAL AND HEALTH INFORMATICS, BHI, 2023,
  • [2] Detection of Prosodic Boundaries in Speech Using Wav2Vec 2.0
    Kunesova, Marie
    Rezackova, Marketa
    TEXT, SPEECH, AND DIALOGUE (TSD 2022), 2022, 13502 : 377 - 388
  • [3] Transfer Ability of Monolingual Wav2vec2.0 for Low-resource Speech Recognition
    Yi, Cheng
    Wang, Jianzong
    Cheng, Ning
    Zhou, Shiyu
    Xu, Bo
    2021 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2021,
  • [4] Brazilian Portuguese Speech Recognition Using Wav2vec 2.0
    Stefanel Gris, Lucas Rafael
    Casanova, Edresson
    de Oliveira, Frederico Santos
    Soares, Anderson da Silva
    Candido Junior, Arnaldo
    COMPUTATIONAL PROCESSING OF THE PORTUGUESE LANGUAGE, PROPOR 2022, 2022, 13208 : 333 - 343
  • [5] Explore Wav2vec 2.0 for Mispronunciation Detection
    Xu, Xiaoshuo
    Kang, Yueteng
    Cao, Songjun
    Lin, Binghuai
    Ma, Long
    INTERSPEECH 2021, 2021, : 4428 - 4432
  • [6] Learning Music Representations with wav2vec 2.0
    Ragano, Alessandro
    Benetos, Emmanouil
    Hines, Andrew
    2023 31ST IRISH CONFERENCE ON ARTIFICIAL INTELLIGENCE AND COGNITIVE SCIENCE, AICS, 2023,
  • [7] SYNTHETIC SPEECH DETECTION WITH WAV2VEC 2.0 IN VARIOUS LANGUAGE SETTINGS
    Dropulic, Branimir
    Suflaj, Miljenko
    Jertec, Andrej
    Obad, Leo
    2024 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING WORKSHOPS, ICASSPW 2024, 2024, : 585 - 589
  • [8] Emotion Recognition from Speech Using Wav2vec 2.0 Embeddings
    Pepino, Leonardo
    Riera, Pablo
    Ferrer, Luciana
    INTERSPEECH 2021, 2021, : 3400 - 3404
  • [9] wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations
    Baevski, Alexei
    Zhou, Henry
    Mohamed, Abdelrahman
    Auli, Michael
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33
  • [10] Speech recognition model design for Sundanese language using WAV2VEC 2.0
    Cryssiover A.
    Zahra A.
    International Journal of Speech Technology, 2024, 27 (01) : 171 - 177