Enhancing accuracy and privacy in speech-based depression detection through speaker disentanglement

被引:4
|
作者
Ravi, Vijay [1 ]
Wang, Jinhan [1 ]
Flint, Jonathan [2 ]
Alwan, Abeer [1 ]
机构
[1] Univ Calif Los Angeles, Dept Elect & Comp Engn, Los Angeles, CA 90095 USA
[2] Univ Calif Los Angeles, Dept Psychiat & Biobehav Sci, Los Angeles, CA 90095 USA
来源
基金
美国国家卫生研究院;
关键词
Depression-detection; Speaker-disentanglement; Privacy; INDICATORS; DIAGNOSIS; FEATURES;
D O I
10.1016/j.csl.2023.101605
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Speech signals are valuable biomarkers for assessing an individua's mental health, including identifying Major Depressive Disorder (MDD) automatically. A frequently used approach in this regard is to employ features related to speaker identity, such as speaker-embeddings. However, over-reliance on speaker identity features in mental health screening systems can compromise patient privacy. Moreover, some aspects of speaker identity may not be relevant for depression detection and could serve as a bias factor that hampers system performance. To overcome these limitations, we propose disentangling speaker-identity information from depression-related information. Specifically, we present four distinct disentanglement methods to achieve this - adversarial speaker identification (SID)-loss maximization (ADV), SID-loss equalization with variance (LEV), SID-loss equalization using Cross-Entropy (LECE) and SID-loss equalization using KL divergence (LEKLD). Our experiments, which incorporated diverse input features and model architectures, have yielded improved F1 scores for MDD detection and voice-privacy attributes, as quantified by Gain in Voice Distinctiveness (G(VD)) and De-Identification Scores (DeID). On the DAIC-WOZ dataset (English), LECE using ComparE16 features results in the best F1-Scores of 80% which represents the audio-only SOTA depression detection F1-Score along with a G(VD) of -1.1 dB and a DeID of 85%. On the EATD dataset (Mandarin), ADV using raw-audio signal achieves an F1-Score of 72.38% surpassing multi-modal SOTA along with a G(VD) of -0.89 dB dB and a DeID of 51.21%. By reducing the dependence on speaker-identity-related features, our method offers a promising direction for speech-based depression detection that preserves patient privacy.
引用
收藏
页数:24
相关论文
共 50 条
  • [1] Avoiding dominance of speaker features in speech-based depression detection
    Zuo, Lishi
    Mak, Man-Wai
    [J]. PATTERN RECOGNITION LETTERS, 2023, 173 : 50 - 56
  • [2] Speaker normalisation for speech-based emotion detection
    Sethu, Vidhyasaharan
    Ambikairajah, Eliathainby
    Epps, Julien
    [J]. PROCEEDINGS OF THE 2007 15TH INTERNATIONAL CONFERENCE ON DIGITAL SIGNAL PROCESSING, 2007, : 611 - +
  • [3] Enhancing Speech-Based Depression Detection Through Gender Dependent Vowel-Level Formant Features
    Cummins, Nicholas
    Vlasenko, Bogdan
    Sagha, Hesam
    Schuller, Bjoern
    [J]. ARTIFICIAL INTELLIGENCE IN MEDICINE, AIME 2017, 2017, 10259 : 209 - 214
  • [4] Assessing speaker independence on a speech-based depression level estimation system
    Lopez-Otero, Paula
    Docio-Fernandez, Laura
    Garcia-Mateo, Carmen
    [J]. PATTERN RECOGNITION LETTERS, 2015, 68 : 343 - 350
  • [5] Domain Adaptation for Enhancing Speech-based Depression Detection in Natural Environmental Conditions Using Dilated CNNs
    Huang, Zhaocheng
    Epps, Julien
    Joachim, Dale
    Stasak, Brian
    Williamson, James R.
    Quatieri, Thomas F.
    [J]. INTERSPEECH 2020, 2020, : 4561 - 4565
  • [6] Learning privacy-enhancing face representations through feature disentanglement
    Bortolato, Blaz
    Ivanovska, Marija
    Rot, Peter
    Krizaj, Janez
    Terhoerst, Philipp
    Damer, Naser
    Peer, Peter
    Struc, Vitomir
    [J]. 2020 15TH IEEE INTERNATIONAL CONFERENCE ON AUTOMATIC FACE AND GESTURE RECOGNITION (FG 2020), 2020, : 495 - 502
  • [7] A study on speech disentanglement framework based on adversarial learning for speaker recognition
    Kwon, Yoohwan
    Chung, Soo-Whan
    Kang, Hong-Goo
    [J]. JOURNAL OF THE ACOUSTICAL SOCIETY OF KOREA, 2020, 39 (05): : 447 - 453
  • [8] Speaker-turn aware diarization for speech-based cognitive assessments
    Xu, Sean Shensheng
    Ke, Xiaoquan
    Mak, Man-Wai
    Wong, Ka Ho
    Meng, Helen
    Kwok, Timothy C. Y.
    Gu, Jason
    Zhang, Jian
    Tao, Wei
    Chang, Chunqi
    [J]. FRONTIERS IN NEUROSCIENCE, 2024, 17
  • [9] Speech-based Evaluation of Emotions-Depression Correlation
    Verde, Laura
    Campanile, Lelio
    Marulli, Fiammetta
    Marrone, Stefano
    [J]. 2022 IEEE INTL CONF ON DEPENDABLE, AUTONOMIC AND SECURE COMPUTING, INTL CONF ON PERVASIVE INTELLIGENCE AND COMPUTING, INTL CONF ON CLOUD AND BIG DATA COMPUTING, INTL CONF ON CYBER SCIENCE AND TECHNOLOGY CONGRESS (DASC/PICOM/CBDCOM/CYBERSCITECH), 2022, : 324 - 329
  • [10] Speaker Turn Aware Similarity Scoring for Diarization of Speech-Based Cognitive Assessments
    Xu, Sean Shensheng
    Mak, Man-Wai
    Wong, Ka Ho
    Meng, Helen
    Kwok, Timothy C. Y.
    [J]. 2021 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2021, : 1299 - 1304