Adversarial-Free Speaker Identity-Invariant Representation Learning for Automatic Dysarthric Speech Classification

被引:1
|
作者
Janbakhshi, Parvaneh [1 ,2 ]
Kodrasi, Ina [1 ]
机构
[1] Idiap Res Inst, Martigny, Switzerland
[2] Ecole Polytech Fed Lausanne, Lausanne, Switzerland
来源
基金
瑞士国家科学基金会;
关键词
Parkinson's disease; speaker identity; feature separation; supervised autoencoder; mutual information; PARKINSONS-DISEASE;
D O I
10.21437/Interspeech.2022-402
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Speech representations which are robust to pathology-unrelated cues such as speaker identity information have been shown to be advantageous for automatic dysarthric speech classification. A recently proposed technique to learn speaker identity-invariant representations for dysarthric speech classification is based on adversarial training. However, adversarial training can be challenging, unstable, and sensitive to training parameters. To avoid adversarial training, in this paper we propose to learn speaker-identity invariant representations exploiting a feature separation framework relying on mutual information minimization. Experimental results on a database of neurotypical and dysarthric speech show that the proposed adversarial-free framework successfully learns speaker identity-invariant representations. Further, it is shown that such representations result in a similar dysarthric speech classification performance as the representations obtained using adversarial training, while the training procedure is more stable and less sensitive to training parameters.
引用
收藏
页码:2138 / 2142
页数:5
相关论文
共 7 条
  • [1] SPEAKER IDENTITY PRESERVATION IN DYSARTHRIC SPEECH RECONSTRUCTION BY ADVERSARIAL SPEAKER ADAPTATION
    Wang, Disong
    Liu, Songxiang
    Wu, Xixin
    Lu, Hui
    Sun, Lifa
    Liu, Xunying
    Meng, Helen
    [J]. 2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 6677 - 6681
  • [2] SPEAKER-INVARIANT AFFECTIVE REPRESENTATION LEARNING VIA ADVERSARIAL TRAINING
    Li, Haoqi
    Tu, Ming
    Huang, Jing
    Narayanan, Shrikanth
    Georgiou, Panayiotis
    [J]. 2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 7144 - 7148
  • [3] Representation Learning to Classify and Detect Adversarial Attacks against Speaker and Speech Recognition Systems
    Villalba, Jesus
    Joshi, Sonal
    Zelasko, Piotr
    Dehak, Najim
    [J]. INTERSPEECH 2021, 2021, : 4304 - 4308
  • [4] Automatic Recognition of Connected Vowels Only Using Speaker-invariant Representation of Speech Dynamics
    Asakawa, Satoshi
    Minematsu, Nobuaki
    Hirose, Keikichi
    [J]. INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4, 2007, : 2352 - +
  • [5] LARGE-SCALE SELF-SUPERVISED SPEECH REPRESENTATION LEARNING FOR AUTOMATIC SPEAKER VERIFICATION
    Chen, Zhengyang
    Chen, Sanyuan
    Wu, Yu
    Qian, Yao
    Wang, Chengyi
    Liu, Shujie
    Qian, Yanmin
    Zeng, Michael
    [J]. 2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 6147 - 6151
  • [6] Speaker-Invariant Feature-Mapping for Distant Speech Recognition via Adversarial Teacher-Student Learning
    Wu, Long
    Chen, Hangting
    Wang, Li
    Zhang, Pengyuan
    Yan, Yonghong
    [J]. INTERSPEECH 2019, 2019, : 431 - 435
  • [7] Transfer-Representation Learning for Detecting Spoofing Attacks with Converted and Synthesized Speech in Automatic Speaker Verification System
    Chang, Su-Yu
    Wu, Kai-Cheng
    Chen, Chia-Ping
    [J]. INTERSPEECH 2019, 2019, : 1063 - 1067