Interpreting Pretrained Speech Models for Automatic Speech Assessment of Voice Disorders

被引:0
|
作者
Lau, Hok Shing [1 ]
Huntly, Mark [1 ]
Morgan, Nathon [1 ]
Iyenoma, Adesua [1 ]
Zeng, Biao [2 ]
Bashford, Tim [1 ]
机构
[1] Univ Wales Trinty St David, Wales Inst Digital Informat, Swansea, W Glam, Wales
[2] Univ South Wales, Psychol Dept, Pontypridd, M Glam, Wales
关键词
Speech Biomarker; Interpretable Machine Learning; Voice Disorder Detection;
D O I
10.1007/978-3-031-67278-1_5
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Speech contains information that is clinically relevant to some diseases, which has the potential to be used for health assessment. Recent work shows an interest in applying deep learning algorithms, especially pretrained large speech models to the applications of Automatic Speech Assessment. One question that has not been explored is how these models output the results based on their inputs. In this work, we train and compare two configurations of Audio Spectrogram Transformer [1] in the context of Voice Disorder Detection and apply the attention rollout method [2] to produce model relevance maps, the computed relevance of the spectrogram regions when the model makes predictions. We use these maps to analyse how models make predictions in different conditions and to show that the spread of attention is reduced as a model is finetuned, and the model attention is concentrated on specific phoneme regions.
引用
收藏
页码:59 / 72
页数:14
相关论文
共 50 条
  • [1] Automatic speech recognition (ASR) and its use as a tool for assessment or therapy of voice, speech, and language disorders
    Kitzing, Peter
    Maier, Andreas
    Ahlander, Viveka Lyberg
    [J]. LOGOPEDICS PHONIATRICS VOCOLOGY, 2009, 34 (02) : 91 - 96
  • [2] PEAKS - A system for the automatic evaluation of voice and speech disorders
    Maier, A.
    Haderlein, T.
    Eysholdt, U.
    Rosanowski, F.
    Batliner, A.
    Schuster, M.
    Noeth, E.
    [J]. SPEECH COMMUNICATION, 2009, 51 (05) : 425 - 437
  • [3] AUTOMATIC SPEECH RECOGNITION FOR ACOUSTICAL ANALYSIS AND ASSESSMENT OF CANTONESE PATHOLOGICAL VOICE AND SPEECH
    Lee, Tan
    Liu, Yuanyuan
    Huang, Pei-Wen
    Chien, Jen-Tzung
    Lam, Wang Kong
    Yeung, Yu Ting
    Law, Thomas K. T.
    Lee, Kathy Y. S.
    Kong, Anthony Pak-Hin
    Law, Sam-Po
    [J]. 2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 6475 - 6479
  • [4] Textually Pretrained Speech Language Models
    Hassid, Michael
    Remez, Tal
    Nguyen, Tu Anh
    Gat, Itai
    Conneau, Alexis
    Kreuk, Felix
    Copet, Jade
    Defossez, Alexandre
    Synnaeve, Gabriel
    Dupoux, Emmanuel
    Schwartz, Roy
    Adi, Yossi
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [5] Automatic Speech Recognition Systems for the Evaluation of Voice and Speech Disorders in Head and Neck Cancer
    Andreas Maier
    Tino Haderlein
    Florian Stelzle
    Elmar Nöth
    Emeka Nkenke
    Frank Rosanowski
    Anne Schützenberger
    Maria Schuster
    [J]. EURASIP Journal on Audio, Speech, and Music Processing, 2010
  • [6] Automatic Speech Recognition Systems for the Evaluation of Voice and Speech Disorders in Head and Neck Cancer
    Maier, Andreas
    Haderlein, Tino
    Stelzle, Florian
    Noeth, Elmar
    Nkenke, Emeka
    Rosanowski, Frank
    Schuetzenberger, Anne
    Schuster, Maria
    [J]. EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2010,
  • [7] VOICE AND SPEECH DISORDERS
    CAPPON, D
    [J]. POSTGRADUATE MEDICINE, 1970, 47 (06) : 239 - &
  • [8] DISORDERS OF SPEECH AND VOICE
    CASPER, J
    [J]. PEDIATRIC ANNALS, 1985, 14 (03): : 220 - +
  • [9] Voice and speech disorders
    Lentze, M. J.
    [J]. MONATSSCHRIFT KINDERHEILKUNDE, 2008, 156 (09) : 847 - 848
  • [10] Introduction to the Issue on Automatic Assessment of Health Disorders Based on Voice, Speech, and Language Processing
    Godino-Llorente, Juan I.
    O'Shaughnessy, Douglas
    Lee, Tan
    Dehak, Najim
    Manfredi, Claudia
    [J]. IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2020, 14 (02) : 234 - 239