Unsupervised Vocal-tract Length Estimation Through Model-based Acoustic-to-Articulatory Inversion

被引:0
|
作者
Cai, Shanqing [1 ,2 ]
Bunnell, H. Timothy [3 ,4 ]
Patel, Rupal [1 ]
机构
[1] Northeastern Univ, Dept Commun Sci & Disorders, Boston, MA 02115 USA
[2] Boston Univ, Dept Speech Language & Hearing Sci, Boston, MA 02215 USA
[3] Univ Delaware, Nemours Biomed Res, Newark, DE 19716 USA
[4] Univ Delaware, Dept Comp & Informat Sci, Newark, DE 19716 USA
关键词
vocal-tract length; acoustic-to-articulatory inversion; global search; dynamic programming; SPEECH PRODUCTION; NORMALIZATION; SPACE;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Knowledge of vocal-tract (VI) length is a logical prerequisite for acoustic-to-articulatory inversion. Prior work has treated VT length estimation (VTLE) and inversion largely as separate problems. We describe a new algorithm for VTLE based on acoustic-to-articulatory inversion. Our inversion process uses the Maeda model (MM, [1,2]) and combines global search [3] and dynamic programming for transforming speech waveforms into articulatory trajectories. The VILE algorithm searches for the VT length of MM that generates the most accurate and smooth inversion result. This new algorithm was tested on samples of non-nasalized diphthongs (e.g., [ai]) synthesized with MM itself, with TubeTalker (a different VT model, [4]) and collected from children and adult speakers; its performance was compared with that from a conventional formant frequency-based method. Results of VTLE on synthesized speech indicate that the inversion-based algorithm led to greater VTLE accuracy and robustness against phonetic variation than the formant-based method. Furthermore, compared to the formant-based method, results from the inversion-based algorithm showed stronger correlation with a MRI-derived VTL measure in adults and greater consistency with formerly reported age-VTL relations in children [5].
引用
收藏
页码:1711 / 1715
页数:5
相关论文
共 50 条
  • [1] Unsupervised Acoustic-to-Articulatory Inversion with Variable Vocal Tract Anatomy
    Sun, Yifan
    Huang, Qinlong
    Wu, Xihong
    [J]. INTERSPEECH 2022, 2022, : 4656 - 4660
  • [2] Vocal tract length normalization for speaker independent acoustic-to-articulatory speech inversion
    Sivaraman, Ganesh
    Mitra, Vikramjit
    Nam, Hosung
    Tiede, Mark
    Espy-Wilson, Carol
    [J]. 17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 455 - 459
  • [3] THE GEOMETRIC VOCAL-TRACT VARIABLES CONTROLLED FOR VOWEL PRODUCTION - PROPOSALS FOR CONSTRAINING ACOUSTIC-TO-ARTICULATORY INVERSION
    BOE, LJ
    PERRIER, P
    BAILLY, G
    [J]. JOURNAL OF PHONETICS, 1992, 20 (01) : 27 - 38
  • [4] Acoustic-to-articulatory mapping codebook constraint for determining vocal-tract length for inverse speech problem and articulatory synthesis
    Yu, ZL
    Zeng, SC
    [J]. 2000 5TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING PROCEEDINGS, VOLS I-III, 2000, : 827 - 830
  • [5] PERFORMANCES OF UNSUPERVISED HMM IN ACOUSTIC-TO-ARTICULATORY INVERSION
    Lachambre, Helene
    Koenig, Lionel
    Andre-Obrecht, Regine
    [J]. 2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 7140 - 7144
  • [6] NUMERICAL INVERSION OF TRANSFORMATION FROM ARTICULATORY TO ACOUSTIC PARAMETERS IN VOCAL-TRACT
    CHANG, JJ
    MATHEWS, MV
    ATAL, B
    TUKEY, JW
    [J]. JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1976, 60 : S77 - S77
  • [7] ESTIMATION OF VOCAL-TRACT LENGTH FROM ACOUSTIC DATA
    WAKITA, H
    [J]. JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1974, 55 : S21 - S21
  • [8] Vocal-tract length estimation
    Sorokin, V. N.
    Geras'kin, I. V.
    [J]. JOURNAL OF COMMUNICATIONS TECHNOLOGY AND ELECTRONICS, 2013, 58 (12) : 1292 - 1301
  • [9] Vocal-tract length estimation
    V. N. Sorokin
    I. V. Geras’kin
    [J]. Journal of Communications Technology and Electronics, 2013, 58 : 1292 - 1301
  • [10] Acoustic-to-Articulatory Inversion of a Three-dimensional Theoretical Vocal Tract Model Using Deep Learning-based Model
    Lapthawan, Thanat
    Prom-on, Santitham
    [J]. 2019 IEEE 10TH INTERNATIONAL CONFERENCE ON AWARENESS SCIENCE AND TECHNOLOGY (ICAST 2019), 2019, : 52 - 56