The effect of speech pathology on automatic speaker verification: a large-scale study

被引:0
|
作者
Soroosh Tayebi Arasteh
Tobias Weise
Maria Schuster
Elmar Noeth
Andreas Maier
Seung Hee Yang
机构
[1] Friedrich-Alexander-Universität Erlangen-Nürnberg,Pattern Recognition Lab
[2] Friedrich-Alexander-Universität Erlangen-Nürnberg,Speech & Language Processing Lab
[3] University Hospital RWTH Aachen,Department of Diagnostic and Interventional Radiology
[4] Ludwig-Maximilians-Universität München,Department of Otorhinolaryngology, Head and Neck Surgery
来源
关键词
D O I
暂无
中图分类号
学科分类号
摘要
Navigating the challenges of data-driven speech processing, one of the primary hurdles is accessing reliable pathological speech data. While public datasets appear to offer solutions, they come with inherent risks of potential unintended exposure of patient health information via re-identification attacks. Using a comprehensive real-world pathological speech corpus, with over n=\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$=$$\end{document}3800 test subjects spanning various age groups and speech disorders, we employed a deep-learning-driven automatic speaker verification (ASV) approach. This resulted in a notable mean equal error rate (EER) of 0.89±0.06%\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$0.89 \pm 0.06 \%$$\end{document}, outstripping traditional benchmarks. Our comprehensive assessments demonstrate that pathological speech overall faces heightened privacy breach risks compared to healthy speech. Specifically, adults with dysphonia are at heightened re-identification risks, whereas conditions like dysarthria yield results comparable to those of healthy speakers. Crucially, speech intelligibility does not influence the ASV system’s performance metrics. In pediatric cases, particularly those with cleft lip and palate, the recording environment plays a decisive role in re-identification. Merging data across pathological types led to a marked EER decrease, suggesting the potential benefits of pathological diversity in ASV, accompanied by a logarithmic boost in ASV effectiveness. In essence, this research sheds light on the dynamics between pathological speech and speaker verification, emphasizing its crucial role in safeguarding patient confidentiality in our increasingly digitized healthcare era.
引用
收藏
相关论文
共 50 条
  • [1] The effect of speech pathology on automatic speaker verification: a large-scale study
    Tayebi Arasteh, Soroosh
    Weise, Tobias
    Schuster, Maria
    Noeth, Elmar
    Maier, Andreas
    Yang, Seung Hee
    [J]. SCIENTIFIC REPORTS, 2023, 13 (01)
  • [2] LARGE-SCALE SELF-SUPERVISED SPEECH REPRESENTATION LEARNING FOR AUTOMATIC SPEAKER VERIFICATION
    Chen, Zhengyang
    Chen, Sanyuan
    Wu, Yu
    Qian, Yao
    Wang, Chengyi
    Liu, Shujie
    Qian, Yanmin
    Zeng, Michael
    [J]. 2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 6147 - 6151
  • [3] Voxceleb: Large-scale speaker verification in the wild
    Nagrani, Arsha
    Chung, Joon Son
    Xie, Weidi
    Zisserman, Andrew
    [J]. COMPUTER SPEECH AND LANGUAGE, 2020, 60
  • [4] SCHEME FOR SPEECH PROCESSING IN AUTOMATIC SPEAKER VERIFICATION
    DAS, SK
    MOHN, WS
    [J]. IEEE TRANSACTIONS ON AUDIO AND ELECTROACOUSTICS, 1971, AU19 (01): : 32 - &
  • [5] ASSESSMENT OF AUTOMATIC SPEAKER VERIFICATION ON LOSSY TRANSCODED SPEECH
    Polacky, Jozef
    Jarina, Roman
    Chmulik, Michal
    [J]. 2016 4TH INTERNATIONAL WORKSHOP ON BIOMETRICS AND FORENSICS (IWBF), 2016,
  • [6] LARGE-SCALE SPEAKER IDENTIFICATION
    Schmidt, Ludwig
    Sharifi, Matthew
    Moreno, Ignacio Lopez
    [J]. 2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,
  • [7] Study on Speaker Verification on Emotional Speech
    Wu, Wei
    Zheng, Thomas Fang
    Xu, Ming-Xing
    Bao, Huan-Jun
    [J]. INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 2102 - 2105
  • [8] AN INVESTIGATION OF MONOTONIC TRANSDUCERS FOR LARGE-SCALE AUTOMATIC SPEECH RECOGNITION
    Moritz, Niko
    Seide, Frank
    Le, Duc
    Mahadeokar, Jay
    Fuegen, Christian
    [J]. 2022 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP, SLT, 2022, : 324 - 330
  • [9] Automatic Speech Recognition of Vietnamese for a New Large-Scale Corpus
    Tran, Linh Thi Thuc
    Kim, Han-Gyu
    La, Hoang Minh
    Pham, Su Van
    [J]. ELECTRONICS, 2024, 13 (05)
  • [10] Speech Recognition with Large-Scale Speaker-Class-Based Acoustic Modeling
    Konno, Kazuki
    Kato, Masaharu
    Kosaka, Tetsuo
    [J]. 2013 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA), 2013,