Manifestation of depression in speech overlaps with characteristics used to represent and recognize speaker identity

被引:5
|
作者
Dumpala, Sri Harsha [1 ,2 ]
Dikaios, Katerina [3 ,4 ]
Rodriguez, Sebastian [1 ,2 ]
Langley, Ross [3 ]
Rempel, Sheri [4 ]
Uher, Rudolf [3 ,4 ]
Oore, Sageev [1 ,2 ]
机构
[1] Dalhousie Univ, Fac Comp Sci, Halifax, NS, Canada
[2] Vector Inst, Toronto, ON, Canada
[3] Dalhousie Univ, Psychiat, Halifax, NS, Canada
[4] Nova Scotia Hlth, Halifax, NS, Canada
基金
加拿大自然科学与工程研究理事会; 加拿大健康研究院;
关键词
FEATURES; VECTORS; PHQ-9; SCORE;
D O I
10.1038/s41598-023-35184-7
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
The sound of a person's voice is commonly used to identify the speaker. The sound of speech is also starting to be used to detect medical conditions, such as depression. It is not known whether the manifestations of depression in speech overlap with those used to identify the speaker. In this paper, we test the hypothesis that the representations of personal identity in speech, known as speaker embeddings, improve the detection of depression and estimation of depressive symptoms severity. We further examine whether changes in depression severity interfere with the recognition of speaker's identity. We extract speaker embeddings from models pre-trained on a large sample of speakers from the general population without information on depression diagnosis. We test these speaker embeddings for severity estimation in independent datasets consisting of clinical interviews (DAIC-WOZ), spontaneous speech (VocalMind), and longitudinal data (VocalMind). We also use the severity estimates to predict presence of depression. Speaker embeddings, combined with established acoustic features (OpenSMILE), predicted severity with root mean square error (RMSE) values of 6.01 and 6.28 in DAIC-WOZ and VocalMind datasets, respectively, lower than acoustic features alone or speaker embeddings alone. When used to detect depression, speaker embeddings showed higher balanced accuracy (BAc) and surpassed previous state-of-the-art performance in depression detection from speech, with BAc values of 66% and 64% in DAIC-WOZ and VocalMind datasets, respectively. Results from a subset of participants with repeated speech samples show that the speaker identification is affected by changes in depression severity. These results suggest that depression overlaps with personal identity in the acoustic space. While speaker embeddings improve depression detection and severity estimation, deterioration or improvement in mood may interfere with speaker verification.
引用
收藏
页数:11
相关论文
共 1 条
  • [1] Manifestation of depression in speech overlaps with characteristics used to represent and recognize speaker identity
    Sri Harsha Dumpala
    Katerina Dikaios
    Sebastian Rodriguez
    Ross Langley
    Sheri Rempel
    Rudolf Uher
    Sageev Oore
    [J]. Scientific Reports, 13