Improving speech depression detection using transfer learning with wav2vec 2.0 in low-resource environments

被引:0
|
作者
Zhang, Xu [1 ]
Zhang, Xiangcheng [2 ]
Chen, Weisi [1 ]
Li, Chenlong [2 ]
Yu, Chengyuan [3 ]
机构
[1] Xiamen Univ Technol, Sch Software Engn, Xiamen 361024, Peoples R China
[2] Xiamen Univ Technol, Sch Comp & Informat Engn, Xiamen 361024, Peoples R China
[3] Jiangxi Agr Univ, Sch Comp & Informat Engn, Nanchang 330045, Peoples R China
来源
SCIENTIFIC REPORTS | 2024年 / 14卷 / 01期
关键词
TIME; NETWORK;
D O I
10.1038/s41598-024-60278-1
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Depression, a pervasive global mental disorder, profoundly impacts daily lives. Despite numerous deep learning studies focused on depression detection through speech analysis, the shortage of annotated bulk samples hampers the development of effective models. In response to this challenge, our research introduces a transfer learning approach for detecting depression in speech, aiming to overcome constraints imposed by limited resources. In the context of feature representation, we obtain depression-related features by fine-tuning wav2vec 2.0. By integrating 1D-CNN and attention pooling structures, we generate advanced features at the segment level, thereby enhancing the model's capability to capture temporal relationships within audio frames. In the realm of prediction results, we integrate LSTM and self-attention mechanisms. This incorporation assigns greater weights to segments associated with depression, thereby augmenting the model's discernment of depression-related information. The experimental results indicate that our model has achieved impressive F1 scores, reaching 79% on the DAIC-WOZ dataset and 90.53% on the CMDC dataset. It outperforms recent baseline models in the field of speech-based depression detection. This provides a promising solution for effective depression detection in low-resource environments.
引用
收藏
页数:14
相关论文
共 50 条
  • [31] Exploring the influence of fine-tuning data on wav2vec 2.0 model for blind speech quality prediction
    Becerra, Helard
    Ragano, Alessandro
    Hines, Andrew
    INTERSPEECH 2022, 2022, : 4088 - 4092
  • [32] Automatic Speech Disfluency Detection Using wav2vec2.0 for Different Languages with Variable Lengths
    Liu, Jiajun
    Wumaier, Aishan
    Wei, Dongping
    Guo, Shen
    APPLIED SCIENCES-BASEL, 2023, 13 (13):
  • [33] Enhancing Stuttering Detection and Classification using Wav2Vec2.0
    Sen, Madhurima
    Das, Pradip K.
    2024 4TH INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND SIGNAL PROCESSING, AISP, 2024,
  • [34] IFF-WAV2VEC: Noise Robust Low-Resource Speech Recognition Based on Self-supervised Learning and Interactive Feature Fusion
    Cao, Jing
    Qian, Zhaopeng
    Yu, Chongchong
    Xie, Tao
    PROCEEDINGS OF 2023 6TH ARTIFICIAL INTELLIGENCE AND CLOUD COMPUTING CONFERENCE, AICCC 2023, 2023, : 232 - 237
  • [35] LOW-RESOURCE LANGUAGE IDENTIFICATION FROM SPEECH USING TRANSFER LEARNING
    Feng, Kexin
    Chaspari, Theodora
    2019 IEEE 29TH INTERNATIONAL WORKSHOP ON MACHINE LEARNING FOR SIGNAL PROCESSING (MLSP), 2019,
  • [36] Applying the conformal prediction paradigm for the uncertainty quantification of an end-to-end automatic speech recognition model (wav2vec 2.0)
    Ernez, Fares
    Arnold, Alexandre
    Galametz, Audrey
    Kobus, Catherine
    Ould-Amer, Nawal
    CONFORMAL AND PROBABILISTIC PREDICTION WITH APPLICATIONS, VOL 204, 2023, 204 : 16 - 35
  • [37] Analyzing Wav2Vec 1.0 Embeddings for Cross-Database Parkinson's Disease Detection and Speech Features Extraction
    Klempir, Ondrej
    Krupicka, Radim
    SENSORS, 2024, 24 (17)
  • [38] Wav-BERT: Cooperative Acoustic and Linguistic Representation Learning for Low-Resource Speech Recognition
    Zheng, Guolin
    Xiao, Yubei
    Gong, Ke
    Zhou, Pan
    Liang, Xiaodan
    Lin, Liang
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2021, 2021, : 2765 - 2777
  • [39] Wav2vec-S: Semi-Supervised Pre-Training for Low-Resource ASR
    Zhu, Han
    Wang, Li
    Wang, Jindong
    Cheng, Gaofeng
    Zhang, Pengyuan
    Yan, Yonghong
    INTERSPEECH 2022, 2022, : 4870 - 4874
  • [40] Low-Resource Emotional Speech Synthesis: Transfer Learning and Data Requirements
    Nesterenko, Anton
    Akhmerov, Ruslan
    Matveeva, Yulia
    Goremykina, Anna
    Astankov, Dmitry
    Shuranov, Evgeniy
    Shirshova, Alexandra
    SPEECH AND COMPUTER, SPECOM 2022, 2022, 13721 : 508 - 521