Improving speech depression detection using transfer learning with wav2vec 2.0 in low-resource environments

被引:0
|
作者
Zhang, Xu [1 ]
Zhang, Xiangcheng [2 ]
Chen, Weisi [1 ]
Li, Chenlong [2 ]
Yu, Chengyuan [3 ]
机构
[1] Xiamen Univ Technol, Sch Software Engn, Xiamen 361024, Peoples R China
[2] Xiamen Univ Technol, Sch Comp & Informat Engn, Xiamen 361024, Peoples R China
[3] Jiangxi Agr Univ, Sch Comp & Informat Engn, Nanchang 330045, Peoples R China
来源
SCIENTIFIC REPORTS | 2024年 / 14卷 / 01期
关键词
TIME; NETWORK;
D O I
10.1038/s41598-024-60278-1
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Depression, a pervasive global mental disorder, profoundly impacts daily lives. Despite numerous deep learning studies focused on depression detection through speech analysis, the shortage of annotated bulk samples hampers the development of effective models. In response to this challenge, our research introduces a transfer learning approach for detecting depression in speech, aiming to overcome constraints imposed by limited resources. In the context of feature representation, we obtain depression-related features by fine-tuning wav2vec 2.0. By integrating 1D-CNN and attention pooling structures, we generate advanced features at the segment level, thereby enhancing the model's capability to capture temporal relationships within audio frames. In the realm of prediction results, we integrate LSTM and self-attention mechanisms. This incorporation assigns greater weights to segments associated with depression, thereby augmenting the model's discernment of depression-related information. The experimental results indicate that our model has achieved impressive F1 scores, reaching 79% on the DAIC-WOZ dataset and 90.53% on the CMDC dataset. It outperforms recent baseline models in the field of speech-based depression detection. This provides a promising solution for effective depression detection in low-resource environments.
引用
收藏
页数:14
相关论文
共 50 条
  • [41] CROSS-LINGUAL TRANSFER LEARNING FOR LOW-RESOURCE SPEECH TRANSLATION
    Khurana, Sameer
    Dawalatabad, Nauman
    Laurent, Antoine
    Vicente, Luis
    Gimeno, Pablo
    Mingote, Victoria
    Glass, James
    2024 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING WORKSHOPS, ICASSPW 2024, 2024, : 670 - 674
  • [42] Multilingual Meta-Transfer Learning for Low-Resource Speech Recognition
    Zhou, Rui
    Koshikawa, Takaki
    Ito, Akinori
    Nose, Takashi
    Chen, Chia-Ping
    IEEE ACCESS, 2024, 12 : 158493 - 158504
  • [43] Language-Adversarial Transfer Learning for Low-Resource Speech Recognition
    Yi, Jiangyan
    Tao, Jianhua
    Wen, Zhengqi
    Bai, Ye
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2019, 27 (03) : 621 - 630
  • [44] Improving low-resource machine transliteration by using 3-way transfer learning
    Wu, Chun-Kai
    Shih, Chao-Chuang
    Wang, Yu-Chun
    Tsai, Richard Tzong-Han
    COMPUTER SPEECH AND LANGUAGE, 2022, 72
  • [45] Low-resource Sinhala Speech Recognition using Deep Learning
    Karunathilaka, Hirunika
    Welgama, Viraj
    Nadungodage, Thilini
    Weerasinghe, Ruvan
    2020 20TH INTERNATIONAL CONFERENCE ON ADVANCES IN ICT FOR EMERGING REGIONS (ICTER-2020), 2020, : 196 - 201
  • [46] CCC-WAV2VEC 2.0: CLUSTERING AIDED CROSS CONTRASTIVE SELF-SUPERVISED LEARNING OF SPEECH REPRESENTATIONS
    Lodagala, Vasista Sai
    Ghosh, Sreyan
    Umesh, S.
    2022 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP, SLT, 2022, : 1 - 8
  • [47] Cross-modal distillation with audio-text fusion for fine-grained emotion classification using BERT and Wav2vec 2.0
    Kim, Donghwa
    Kang, Pilsung
    NEUROCOMPUTING, 2022, 506 : 168 - 183
  • [48] Using Machine Learning to Automate Classroom Observation for Low-resource Environments
    Shapsough, Salsabeel
    Zualkernan, Imran
    2018 IEEE GLOBAL HUMANITARIAN TECHNOLOGY CONFERENCE (GHTC), 2018,
  • [49] Transfer Learning Framework for Low-Resource Text-to-Speech using a Large-Scale Unlabeled Speech Corpus
    Kim, Minchan
    Jeong, Myeonghun
    Choi, Byoung Jin
    Ahn, Sunghwan
    Lee, Joun Yeop
    Kim, Nam Soo
    INTERSPEECH 2022, 2022, : 788 - 792
  • [50] Speech emotion recognition using fine-tuned Wav2vec2.0 and neural controlleddifferential equations classifier
    Wang, Ni
    Yang, Danyu
    PLOS ONE, 2025, 20 (02):