Improving Speech-Based Dysarthria Detection using Multi-task Learning with Gradient Projection

被引:0
|
作者
Xiang, Yan [1 ]
Berisha, Visar [1 ,2 ]
Liss, Julie [2 ]
Chakrabarti, Chaitali [1 ]
机构
[1] Arizona State Univ, Sch Elect Comp & Energy Engn, Tempe, AZ 85281 USA
[2] Arizona State Univ, Coll Hlth Solut, Tempe, AZ USA
来源
关键词
Dysarthria detection; speech processing; deep neural network; multi-task learning; DISEASE;
D O I
10.21437/Interspeech.2024-1563
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Speech analytic models based on deep learning are popular in clinical diagnostics. However, constraints on clinical data collection and sharing place limits on available dataset sizes, which adversely impacts trained model performance. Multi-task learning (MTL) has been utilized to mitigate the effect of limited sample size by jointly training on multiple tasks that are considered to be related. However, discrepancies between clinical and non-clinical tasks can reduce MTL efficiency and can even cause it to fail, especially when there are gradient conflicts. In this paper, we enhance the performance of dysarthria detection by using MTL with an auxiliary task of learning speaker embeddings. We propose a task-specific gradient projection method to overcome gradient conflicts. Our evaluation shows that the proposed MTL paradigm outperforms both single-task learning and conventional MTL under different data availability settings.
引用
收藏
页码:902 / 906
页数:5
相关论文
共 50 条
  • [41] Adaptive multi-task learning for speech to text translation
    Feng, Xin
    Zhao, Yue
    Zong, Wei
    Xu, Xiaona
    EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2024, 2024 (01):
  • [42] Online Multi-Task Learning for Policy Gradient Methods
    Ammar, Haitham Bou
    Eaton, Eric
    Ruvolo, Paul
    Taylor, Matthew E.
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 32 (CYCLE 2), 2014, 32 : 1206 - 1214
  • [43] A New Benchmark of Aphasia Speech Recognition and Detection Based on E-Branchformer and Multi-task Learning
    Tang, Jiyang
    Chen, William
    Chang, Xuankai
    Watanabe, Shinji
    MacWhinney, Brian
    INTERSPEECH 2023, 2023, : 1528 - 1532
  • [44] Improving Deep Neural Network Based Speech Synthesis through Contextual Feature Parametrization and Multi-Task Learning
    Wen, Zhengqi
    Li, Kehuang
    Huang, Zhen
    Lee, Chin-Hui
    Tao, Jianhua
    JOURNAL OF SIGNAL PROCESSING SYSTEMS FOR SIGNAL IMAGE AND VIDEO TECHNOLOGY, 2018, 90 (07): : 1025 - 1037
  • [45] Improving Evidential Deep Learning via Multi-Task Learning
    Oh, Dongpin
    Shin, Bonggun
    THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 7895 - 7903
  • [46] Speech-based detection of multi-class Alzheimer's disease classification using machine learning
    Tripathi, Tripti
    Kumar, Rakesh
    INTERNATIONAL JOURNAL OF DATA SCIENCE AND ANALYTICS, 2024, 18 (01) : 83 - 96
  • [47] Improved Accented Speech Recognition Using Accent Embeddings and Multi-task Learning
    Jain, Abhinav
    Upreti, Minali
    Jyothi, Preethi
    19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 2454 - 2458
  • [48] GDOD: Effective Gradient Descent using Orthogonal Decomposition for Multi-Task Learning
    Dong, Xin
    Wu, Ruize
    Xiong, Chao
    Li, Hai
    Cheng, Lei
    He, Yong
    Qian, Shiyou
    Cao, Jian
    Mo, Linjian
    PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, CIKM 2022, 2022, : 386 - 395
  • [49] Coarse-to-Fine Speech Emotion Recognition Based on Multi-Task Learning
    Zhao Huijuan
    Ye Ning
    Wang Ruchuan
    Journal of Signal Processing Systems, 2021, 93 : 299 - 308
  • [50] Attention-based LSTM with Multi-task Learning for Distant Speech Recognition
    Zhang, Yu
    Zhang, Pengyuan
    Yan, Yonghong
    18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 3857 - 3861