Improving speech depression detection using transfer learning with wav2vec 2.0 in low-resource environments

被引:0
|
作者
Zhang, Xu [1 ]
Zhang, Xiangcheng [2 ]
Chen, Weisi [1 ]
Li, Chenlong [2 ]
Yu, Chengyuan [3 ]
机构
[1] Xiamen Univ Technol, Sch Software Engn, Xiamen 361024, Peoples R China
[2] Xiamen Univ Technol, Sch Comp & Informat Engn, Xiamen 361024, Peoples R China
[3] Jiangxi Agr Univ, Sch Comp & Informat Engn, Nanchang 330045, Peoples R China
来源
SCIENTIFIC REPORTS | 2024年 / 14卷 / 01期
关键词
TIME; NETWORK;
D O I
10.1038/s41598-024-60278-1
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Depression, a pervasive global mental disorder, profoundly impacts daily lives. Despite numerous deep learning studies focused on depression detection through speech analysis, the shortage of annotated bulk samples hampers the development of effective models. In response to this challenge, our research introduces a transfer learning approach for detecting depression in speech, aiming to overcome constraints imposed by limited resources. In the context of feature representation, we obtain depression-related features by fine-tuning wav2vec 2.0. By integrating 1D-CNN and attention pooling structures, we generate advanced features at the segment level, thereby enhancing the model's capability to capture temporal relationships within audio frames. In the realm of prediction results, we integrate LSTM and self-attention mechanisms. This incorporation assigns greater weights to segments associated with depression, thereby augmenting the model's discernment of depression-related information. The experimental results indicate that our model has achieved impressive F1 scores, reaching 79% on the DAIC-WOZ dataset and 90.53% on the CMDC dataset. It outperforms recent baseline models in the field of speech-based depression detection. This provides a promising solution for effective depression detection in low-resource environments.
引用
收藏
页数:14
相关论文
共 50 条
  • [21] Improving Speech Translation Accuracy and Time Efficiency With Fine-Tuned wav2vec 2.0-Based Speech Segmentation
    Fukuda, Ryo
    Sudoh, Katsuhito
    Nakamura, Satoshi
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2024, 32 : 906 - 916
  • [22] MULTI-LINGUAL MULTI-TASK SPEECH EMOTION RECOGNITION USING WAV2VEC 2.0
    Sharma, Mayank
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 6907 - 6911
  • [23] Speech Emotion Recognition Based on Shallow Structure of Wav2vec 2.0 and Attention Mechanism
    Zhang, Yumei
    Jia, Maoshen
    Cao, Xuan
    Zhao, Zichen
    2024 IEEE 14TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING, ISCSLP 2024, 2024, : 398 - 402
  • [24] PROFICIENCY ASSESSMENT OF L2 SPOKEN ENGLISH USING WAV2VEC 2.0
    Banno, Stefano
    Matassoni, Marco
    2022 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP, SLT, 2022, : 1088 - 1095
  • [25] Improving Tone Recognition Performance using Wav2vec 2.0-Based Learned Representation in Yoruba, a Low-Resourced Language
    Obiang, Saint germes b. bengono
    Tsopze, Norbert
    Yonta, Paulin melatagia
    Bonastre, Jean-francois
    Jimenez, Tania
    ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2024, 23 (12)
  • [26] WavBERT: Exploiting Semantic and Non-semantic Speech using Wav2vec and BERT for Dementia Detection
    Zhu, Youxiang
    Obyat, Abdelrahman
    Liang, Xiaohui
    Batsis, John A.
    Roth, Robert M.
    INTERSPEECH 2021, 2021, : 3790 - 3794
  • [27] W2V2-Light: A Lightweight Version of Wav2vec 2.0 for Automatic Speech Recognition
    Kim, Dong-Hyun
    Lee, Jae-Hong
    Mo, Ji-Hwan
    Chang, Joon-Hyuk
    INTERSPEECH 2022, 2022, : 3038 - 3042
  • [28] Using Speaker-Specific Emotion Representations in Wav2vec 2.0-Based Modules for Speech Emotion Recognition
    Park, Somin
    Mark, Mpabulungi
    Park, Bogyung
    Hong, Hyunki
    CMC-COMPUTERS MATERIALS & CONTINUA, 2023, 77 (01): : 1009 - 1030
  • [29] An improved wav2vec 2.0 pre-training approach using enhanced local dependency modeling for speech recognition
    Zhu, Qiu-shi
    Zhang, Jie
    Wu, Ming-hui
    Fang, Xin
    Dai, Li-Rong
    INTERSPEECH 2021, 2021, : 4334 - 4338
  • [30] Exploring the potential of Wav2vec 2.0 for speech emotion recognition using classifier combination and attention-based feature fusion
    Nasersharif, Babak
    Namvarpour, Mohammad
    JOURNAL OF SUPERCOMPUTING, 2024, 80 (16): : 23667 - 23688