Depression recognition using a proposed speech chain model fusing speech production and perception features

被引:16
|
作者
Du, Minghao [1 ]
Liu, Shuang [1 ]
Wang, Tao [1 ]
Zhang, Wenquan [1 ]
Ke, Yufeng [1 ]
Chen, Long [1 ]
Ming, Dong [1 ,2 ]
机构
[1] Tianjin Univ, Acad Med Engn & Translat Med, Tianjin Int Joint Res Ctr Neural Engn, Tianjin, Peoples R China
[2] Tianjin Univ, Dept Biomed Engn, Lab Neural Engn & Rehabil, Coll Precis Instruments & Optoelect Engn, Tianjin, Peoples R China
基金
中国国家自然科学基金;
关键词
Depression; Deep learning; Audio; Feature fusion; Auxiliary diagnosis; DISORDER; MACHINE;
D O I
10.1016/j.jad.2022.11.060
中图分类号
R74 [神经病学与精神病学];
学科分类号
摘要
Background: Increasing depression patients puts great pressure on clinical diagnosis. Audio-based diagnosis is a helpful auxiliary tool for early mass screening. However, current methods consider only speech perception features, ignoring patients' vocal tract changes, which may partly result in the poor recognition. Methods: This work proposes a novel machine speech chain model for depression recognition (MSCDR) that can capture text-independent depressive speech representation from the speaker's mouth to the listener's ear to improve recognition performance. In the proposed MSCDR, linear predictive coding (LPC) and Mel-frequency cepstral coefficients (MFCC) features are extracted to describe the processes of speech generation and of speech perception, respectively. Then, a one-dimensional convolutional neural network and a long short-term memory network sequentially capture intra- and inter-segment dynamic depressive features for classification. Results: We tested the MSCDR on two public datasets with different languages and paradigms, namely, the Distress Analysis Interview Corpus-Wizard of Oz and the Multi-modal Open Dataset for Mental-disorder Analysis. The accuracy of the MSCDR on the two datasets was 0.77 and 0.86, and the average F1 score was 0.75 and 0.86, which were better than the other existing methods. This improvement reveals the complementarity of speech production and perception features in carrying depressive information. Limitations: The sample size was relatively small, which may limit the application in clinical translation to some extent. Conclusion: This experiment proves the good generalization ability and superiority of the proposed MSCDR and suggests that the vocal tract changes in patients with depression deserve attention for audio-based depression diagnosis.
引用
收藏
页码:299 / 308
页数:10
相关论文
共 50 条
  • [41] Speech Features for Depression Detection
    Sahu, Saurabh
    Espy-Wilson, Carol
    17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 1928 - 1932
  • [42] Combined spectral and speech features for pig speech recognition
    Wu, Xuan
    Zhou, Silong
    Chen, Mingwei
    Zhao, Yihang
    Wang, Yifei
    Zhao, Xianmeng
    Li, Danyang
    Pu, Haibo
    PLOS ONE, 2022, 17 (12):
  • [43] Topological invariants as speech features for automatic speech recognition
    Kacur, Juraj
    Chudy, Vladimir
    INTERNATIONAL JOURNAL OF SIGNAL AND IMAGING SYSTEMS ENGINEERING, 2014, 7 (04) : 235 - 244
  • [44] Blind speech separation using a joint model of speech production
    Smith, D
    Lukasiak, J
    Burnett, I
    IEEE SIGNAL PROCESSING LETTERS, 2005, 12 (11) : 784 - 787
  • [45] Depression Detection in Arabic Using Speech Language Recognition
    Alsharif, Zainab
    Elhag, Salma
    Alfakeh, Sulhi
    2022 7TH INTERNATIONAL CONFERENCE ON DATA SCIENCE AND MACHINE LEARNING APPLICATIONS (CDMA 2022), 2022, : 61 - 66
  • [46] Speech Planning and Dynamics (Speech Production and Perception, 1)
    Calamai, Silvia
    STUDI E SAGGI LINGUISTICI, 2016, 54 (02): : 135 - 141
  • [47] Speech face perception is locked to anticipation in speech production
    Troille, Emilie
    Cathiard, Marie-Agnes
    Abry, Christian
    SPEECH COMMUNICATION, 2010, 52 (06) : 513 - 524
  • [48] The relationship of speech perception and speech production: It's complicated
    Baese-Berk, Melissa M.
    Kapnoula, Efthymia C.
    Samuel, Arthur G.
    PSYCHONOMIC BULLETIN & REVIEW, 2025, 32 (01) : 226 - 242
  • [49] Speech perception and speech production as indicators of reading difficulty
    Post, YV
    Foorman, BR
    Hiscock, M
    ANNALS OF DYSLEXIA, 1997, 47 : 3 - 27
  • [50] Does training in speech perception modify speech production?
    AkahaneYamada, R
    Tohkura, Y
    Bradlow, AR
    Pisoni, DB
    ICSLP 96 - FOURTH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, VOLS 1-4, 1996, : 606 - 609