Vocal tract length normalization for speaker independent acoustic-to-articulatory speech inversion

被引：15

作者：

Sivaraman, Ganesh ^{[1
]}

Mitra, Vikramjit ^{[2
]}

Nam, Hosung ^{[3
]}

Tiede, Mark ^{[4
]}

Espy-Wilson, Carol ^{[1
]}

机构：

[1] Univ Maryland, College Pk, MD 20742 USA

[2] SRI Int, 333 Ravenswood Ave, Menlo Pk, CA 94025 USA

[3] Korea Univ, Dept English Language & Literature, Seoul, South Korea

[4] Haskins Labs Inc, New Haven, CT USA

来源：

17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES | 2016年

基金：

美国国家科学基金会;

关键词：

Acoustic to articulatory speech inversion; speaker normalization; Vocal Tract Length Normalization;

D O I：

10.21437/Interspeech.2016-1399

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Speech inversion is a well-known ill-posed problem and addition of speaker differences typically makes it even harder. This paper investigates a vocal tract length normalization (VTLN) technique to transform the acoustic space of different speakers to a target speaker space such that speaker specific details are minimized. The speaker normalized features are then used to train a feed-forward neural network based acoustic-to articulatory speech inversion system. The acoustic features are parameterized as time-contextualized mel-frequency cepstral coefficients and the articulatory features are represented by six tract-variable (TV) trajectories. Experiments are performed with ten speakers from the U. Wisc. X-ray microbeam database. Speaker dependent speech inversion systems are trained for each speaker as baselines to compare the performance of the speaker independent approach. For each target speaker, data from the remaining nine speakers are transformed using the proposed approach and the transformed features are used to train a speech inversion system. The performances of the individual systems are compared using the correlation between the estimated and the actual TVs on the target speaker's test set. Results show that the proposed speaker normalization approach provides a 7% absolute improvement in correlation as compared to the system where speaker normalization was not performed.

引用

页码：455 / 459

页数：5

共 50 条

[31] Incorporation of phonetic constraints in acoustic-to-articulatory inversion
Potard, Blaise
Laprie, Yves
Ouni, Slim
[J]. JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2008, 123 (04): : 2310 - 2323
[32] A generalized smoothness criterion for acoustic-to-articulatory inversion
Ghosh, Prasanta Kumar
Narayanan, Shrikanth
[J]. JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2010, 128 (04): : 2162 - 2172
[33] A DEEP RECURRENT APPROACH FOR ACOUSTIC-TO-ARTICULATORY INVERSION
Liu, Peng
Yu, Quanjie
Wu, Zhiyong
Kang, Shiyin
Meng, Helen
Cai, Lainhong
[J]. 2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 4450 - 4454
[34] Acoustic-to-Articulatory Inversion based on Local Regression
Al Moubayed, Samer
Ananthakrishnan, G.
[J]. 11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 1-2, 2010, : 937 - 940
[35] Speech modelling based on acoustic-to-articulatory mapping
Schoentgen, J
[J]. NONLINEAR SPEECH MODELING AND APPLICATIONS, 2005, 3445 : 114 - 135
[36] ACOUSTIC-TO-ARTICULATORY INVERSION USING AN EPISODIC MEMORY
Demange, S.
Ouni, S.
[J]. 2011 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2011, : 4620 - 4623
[37] Acoustic-to-Articulatory Inversion with Deep Autoregressive Articulatory-WaveNet
Bozorg, Narjes
Johnson, Michael T.
[J]. INTERSPEECH 2020, 2020, : 3725 - 3729
[38] A DNN Based Speech Enhancement Approach to Noise Robust Acoustic-to-Articulatory Inversion
Shahrebabaki, Abdolreza Sabzi
Siniscalchi, Sabato Marco
Salvi, Giampiero
Svendsen, Torbjorn
[J]. 2021 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS (ISCAS), 2021,
[39] Across-speaker Articulatory Normalization for Speaker-independent Silent Speech Recognition
Wang, Jun
Samal, Ashok
Green, Jordan R.
[J]. 15TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2014), VOLS 1-4, 2014, : 1179 - 1183
[40] PERFORMANCES OF UNSUPERVISED HMM IN ACOUSTIC-TO-ARTICULATORY INVERSION
Lachambre, Helene
Koenig, Lionel
Andre-Obrecht, Regine
[J]. 2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 7140 - 7144

← 1 2 3 4 5 →