Avoiding dominance of speaker features in speech-based depression detection

被引:3
|
作者
Zuo, Lishi [1 ]
Mak, Man-Wai [1 ]
机构
[1] Hong Kong Polytech Univ, Dept Elect & Informat Engn, Hong Kong, Peoples R China
基金
中国国家自然科学基金;
关键词
Depression detection; Speaker invariance; Feature disentanglement; Speaker embedding;
D O I
10.1016/j.patrec.2023.07.016
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The performance of speech-based depression detectors is limited by the scarcity and imbalance in depression data. We found that depression detectors could be strongly biased toward speaker features when the number of training speakers is insufficient. To address this issue, we propose a speaker-invariant depression detector (SIDD) that minimizes speaker information in the latent space. The SIDD consists of an autoencoder, a depression classifier, and a speaker-embedding projector. By incorporating speaker-embedding vectors into the autoencoder's latent vectors, speaker information is effectively eliminated for the depression classifier. Experimental results demonstrate significant improvements achieved by minimizing speaker information, and our proposed method generally outperforms previous approaches for depression detection on the DAIC-WOZ dataset.
引用
收藏
页码:50 / 56
页数:7
相关论文
共 50 条
  • [1] Speaker normalisation for speech-based emotion detection
    Sethu, Vidhyasaharan
    Ambikairajah, Eliathainby
    Epps, Julien
    [J]. PROCEEDINGS OF THE 2007 15TH INTERNATIONAL CONFERENCE ON DIGITAL SIGNAL PROCESSING, 2007, : 611 - +
  • [2] Enhancing accuracy and privacy in speech-based depression detection through speaker disentanglement
    Ravi, Vijay
    Wang, Jinhan
    Flint, Jonathan
    Alwan, Abeer
    [J]. COMPUTER SPEECH AND LANGUAGE, 2024, 86
  • [3] Assessing speaker independence on a speech-based depression level estimation system
    Lopez-Otero, Paula
    Docio-Fernandez, Laura
    Garcia-Mateo, Carmen
    [J]. PATTERN RECOGNITION LETTERS, 2015, 68 : 343 - 350
  • [4] Glottal Source Features for Automatic Speech-based Depression Assessment
    Simantiraki, Olympia
    Charonyktakis, Paulos
    Pampouchidou, Anastasia
    Tsiknakis, Manolis
    Cooker, Martin
    [J]. 18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 2700 - 2704
  • [5] Exploring Modulation Spectrum Features for Speech-Based Depression Level Classification
    Bozkurt, Elif
    Toledo-Ronen, Orith
    Sorin, Alexander
    Hoory, Ron
    [J]. 15TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2014), VOLS 1-4, 2014, : 1243 - 1247
  • [6] Natural Language Processing Methods for Acoustic and Landmark Event-Based Features in Speech-Based Depression Detection
    Huang, Zhaocheng
    Epps, Julien
    Joachim, Dale
    Sethu, Vidhyasaharan
    [J]. IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2020, 14 (02) : 435 - 448
  • [7] Enhancing Speech-Based Depression Detection Through Gender Dependent Vowel-Level Formant Features
    Cummins, Nicholas
    Vlasenko, Bogdan
    Sagha, Hesam
    Schuller, Bjoern
    [J]. ARTIFICIAL INTELLIGENCE IN MEDICINE, AIME 2017, 2017, 10259 : 209 - 214
  • [8] Synthetic Speech Detection Based on the Temporal Consistency of Speaker Features
    Zhang, Yuxiang
    Li, Zhuo
    Lu, Jingze
    Wang, Wenchao
    Zhang, Pengyuan
    [J]. IEEE SIGNAL PROCESSING LETTERS, 2024, 31 : 944 - 948
  • [9] Speaker-turn aware diarization for speech-based cognitive assessments
    Xu, Sean Shensheng
    Ke, Xiaoquan
    Mak, Man-Wai
    Wong, Ka Ho
    Meng, Helen
    Kwok, Timothy C. Y.
    Gu, Jason
    Zhang, Jian
    Tao, Wei
    Chang, Chunqi
    [J]. FRONTIERS IN NEUROSCIENCE, 2024, 17
  • [10] Speech Features for Depression Detection
    Sahu, Saurabh
    Espy-Wilson, Carol
    [J]. 17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 1928 - 1932