Enhancing accuracy and privacy in speech-based depression detection through speaker disentanglement

被引:4
|
作者
Ravi, Vijay [1 ]
Wang, Jinhan [1 ]
Flint, Jonathan [2 ]
Alwan, Abeer [1 ]
机构
[1] Univ Calif Los Angeles, Dept Elect & Comp Engn, Los Angeles, CA 90095 USA
[2] Univ Calif Los Angeles, Dept Psychiat & Biobehav Sci, Los Angeles, CA 90095 USA
来源
基金
美国国家卫生研究院;
关键词
Depression-detection; Speaker-disentanglement; Privacy; INDICATORS; DIAGNOSIS; FEATURES;
D O I
10.1016/j.csl.2023.101605
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Speech signals are valuable biomarkers for assessing an individua's mental health, including identifying Major Depressive Disorder (MDD) automatically. A frequently used approach in this regard is to employ features related to speaker identity, such as speaker-embeddings. However, over-reliance on speaker identity features in mental health screening systems can compromise patient privacy. Moreover, some aspects of speaker identity may not be relevant for depression detection and could serve as a bias factor that hampers system performance. To overcome these limitations, we propose disentangling speaker-identity information from depression-related information. Specifically, we present four distinct disentanglement methods to achieve this - adversarial speaker identification (SID)-loss maximization (ADV), SID-loss equalization with variance (LEV), SID-loss equalization using Cross-Entropy (LECE) and SID-loss equalization using KL divergence (LEKLD). Our experiments, which incorporated diverse input features and model architectures, have yielded improved F1 scores for MDD detection and voice-privacy attributes, as quantified by Gain in Voice Distinctiveness (G(VD)) and De-Identification Scores (DeID). On the DAIC-WOZ dataset (English), LECE using ComparE16 features results in the best F1-Scores of 80% which represents the audio-only SOTA depression detection F1-Score along with a G(VD) of -1.1 dB and a DeID of 85%. On the EATD dataset (Mandarin), ADV using raw-audio signal achieves an F1-Score of 72.38% surpassing multi-modal SOTA along with a G(VD) of -0.89 dB dB and a DeID of 51.21%. By reducing the dependence on speaker-identity-related features, our method offers a promising direction for speech-based depression detection that preserves patient privacy.
引用
收藏
页数:24
相关论文
共 50 条
  • [21] Spatial-Temporal Feature Network for Speech-Based Depression Recognition
    Han, Zhuojin
    Shang, Yuanyuan
    Shao, Zhuhong
    Liu, Jingyi
    Guo, Guodong
    Liu, Tie
    Ding, Hui
    Hu, Qiang
    [J]. IEEE TRANSACTIONS ON COGNITIVE AND DEVELOPMENTAL SYSTEMS, 2024, 16 (01) : 308 - 318
  • [22] Differential Performance of Automatic Speech-Based Depression Classification Across Smartphones
    Stasak, Brian
    Epps, Julien
    [J]. 2017 SEVENTH INTERNATIONAL CONFERENCE ON AFFECTIVE COMPUTING AND INTELLIGENT INTERACTION WORKSHOPS AND DEMOS (ACIIW), 2017, : 171 - 175
  • [23] Investigation of speech-based language-independent possibilities of depression recognition
    Kiss, Gabor
    [J]. 2022 45TH INTERNATIONAL CONFERENCE ON TELECOMMUNICATIONS AND SIGNAL PROCESSING, TSP, 2022, : 226 - 229
  • [24] Detection of Collaboration: Relationship Between Log and Speech-Based Classification
    Viswanathan, Sree Aurovindh
    Vanlehn, Kurt
    [J]. ARTIFICIAL INTELLIGENCE IN EDUCATION, AIED 2019, PT II, 2019, 11626 : 327 - 331
  • [25] Speech-Based Detection of Alzheimer's Disease in Conversational German
    Weiner, Jochen
    Herff, Christian
    Schultz, Tanja
    [J]. 17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 1938 - 1942
  • [26] Conversation Detection and Speaker Segmentation in Privacy-Sensitive Situated Speech Data
    Wyatt, Danny
    Choudhury, Tanzeem
    Bilmes, Jeff
    [J]. INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4, 2007, : 69 - +
  • [27] Automated speech-based screening of depression using deep convolutional neural networks
    Chlasta, Karol
    Wolk, Krzysztof
    Krejtz, Izabela
    [J]. CENTERIS2019--INTERNATIONAL CONFERENCE ON ENTERPRISE INFORMATION SYSTEMS/PROJMAN2019--INTERNATIONAL CONFERENCE ON PROJECT MANAGEMENT/HCIST2019--INTERNATIONAL CONFERENCE ON HEALTH AND SOCIAL CARE INFORMATION SYSTEMS AND TECHNOLOGIES, 2019, 164 : 618 - 628
  • [28] Evaluation of Speech-Based Protocol for Detection of Early-Stage Dementia
    Satt, Aharon
    Sorin, Alexander
    Toledo-Ronen, Orith
    Barkan, Oren
    Kompatsiaris, Ioannis
    Kokonozi, Athina
    Tsolaki, Magda
    [J]. 14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 1691 - 1695
  • [29] Exploring Federated Learning for Speech-based Parkinson's Disease Detection
    Sarlas, Athanasios
    Kalafatelis, Alexandros S.
    Alexandridis, Georgios
    Kourtis, Michail-Alexandros
    Trakadas, Panagiotis
    [J]. 18TH INTERNATIONAL CONFERENCE ON AVAILABILITY, RELIABILITY & SECURITY, ARES 2023, 2023,
  • [30] The efficacy of memory load on speech-based detection of Alzheimer's disease
    Bae, Minju
    Seo, Myo-Gyeong
    Ko, Hyunwoong
    Ham, Hyunsun
    Kim, Keun You
    Lee, Jun-Young
    [J]. FRONTIERS IN AGING NEUROSCIENCE, 2023, 15