Unsupervised speaker adaptation for speaker independent acoustic to articulatory speech inversion

被引:21
|
作者
Sivaraman, Ganesh [1 ]
Mitra, Vikramjit [1 ]
Nam, Hosung [2 ]
Tiede, Mark [3 ]
Espy-Wilson, Carol [1 ]
机构
[1] Univ Maryland, Elect & Comp Engn, College Pk, MD 20740 USA
[2] Korea Univ, Seoul, South Korea
[3] Haskins Labs Inc, New Haven, CT 06511 USA
来源
基金
美国国家科学基金会;
关键词
VOCAL-TRACT; MOVEMENTS;
D O I
10.1121/1.5116130
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Speech inversion is a well-known ill-posed problem and addition of speaker differences typically makes it even harder. Normalizing the speaker differences is essential to effectively using multi-speaker articulatory data for training a speaker independent speech inversion system. This paper explores a vocal tract length normalization (VTLN) technique to transform the acoustic features of different speakers to a target speaker acoustic space such that speaker specific details are minimized. The speaker normalized features are then used to train a deep feed-forward neural network based speech inversion system. The acoustic features are parameterized as time-contextualized mel-frequency cepstral coefficients. The articulatory features are represented by six tract-variable (TV) trajectories, which are relatively speaker invariant compared to flesh point data. Experiments are performed with ten speakers from the University of Wisconsin X-ray microbeam database. Results show that the proposed speaker normalization approach provides an 8.15% relative improvement in correlation between actual and estimated TVs as compared to the system where speaker normalization was not performed. To determine the efficacy of the method across datasets, cross speaker evaluations were performed across speakers from the Multichannel Articulatory-TIMIT and EMA-IEEE datasets. Results prove that the VTLN approach provides improvement in performance even across datasets. (C) 2019 Acoustical Society of America.
引用
收藏
页码:316 / 329
页数:14
相关论文
共 50 条
  • [1] An investigation on speaker specific articulatory synthesis with speaker independent articulatory inversion
    Illa, Aravind
    Ghosh, Prasanta Kumar
    [J]. INTERSPEECH 2019, 2019, : 121 - 125
  • [2] Vocal tract length normalization for speaker independent acoustic-to-articulatory speech inversion
    Sivaraman, Ganesh
    Mitra, Vikramjit
    Nam, Hosung
    Tiede, Mark
    Espy-Wilson, Carol
    [J]. 17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 455 - 459
  • [3] Autoregressive Articulatory WaveNet Flow for Speaker-Independent Acoustic-to-Articulatory Inversion
    Bozorg, Narjes
    Johnson, Michael T.
    Soleymanpour, Mohammad
    [J]. 2021 INTERNATIONAL CONFERENCE ON SPEECH TECHNOLOGY AND HUMAN-COMPUTER DIALOGUE (SPED), 2021, : 156 - 161
  • [4] Speaker adaptation method for acoustic-to-articulatory inversion using an HMM-based speech production model
    Hiroya, S
    Honda, M
    [J]. IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2004, E87D (05): : 1071 - 1078
  • [5] Parallel Reference Speaker Weighting for Kinematic-Independent Acoustic-to-Articulatory Inversion
    Ji, An
    Johnson, Michael T.
    Berry, Jeffrey J.
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2016, 24 (10) : 1865 - 1875
  • [6] Across-speaker Articulatory Normalization for Speaker-independent Silent Speech Recognition
    Wang, Jun
    Samal, Ashok
    Green, Jordan R.
    [J]. 15TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2014), VOLS 1-4, 2014, : 1179 - 1183
  • [7] Speaker-Independent Silent Speech Recognition with Across-Speaker Articulatory Normalization and Speaker Adaptive Training
    Wang, Jun
    Hahm, Seongjun
    [J]. 16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 2415 - 2419
  • [8] Towards a Speaker Independent Speech-BCI Using Speaker Adaptation
    Dash, Debadatta
    Wisler, Alan
    Ferrari, Paul
    Wang, Jun
    [J]. INTERSPEECH 2019, 2019, : 864 - 868
  • [9] Speaker Adaptation of an Acoustic-Articulatory Inversion Model using Cascaded Gaussian Mixture Regressions
    Hueber, Thomas
    Bailly, Gerard
    Badin, Pierre
    Elisei, Frederic
    [J]. 14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 2752 - 2756
  • [10] An Acoustic-Phonetic-Based Speaker Adaptation Technique for Improving Speaker-Independent Continuous Speech Recognition
    Zhao, Yunxin
    [J]. IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 1994, 2 (03): : 380 - 394