CONCATENATIVE ARTICULATORY VIDEO SYNTHESIS USING REAL-TIME MRI DATA FOR SPOKEN LANGUAGE TRAINING

被引:0
|
作者
Desai, Urvish [1 ]
Yarra, Chiranjeevi [2 ]
Ghosh, Prasanta Kumar [2 ]
机构
[1] Indian Inst Technol ISM, Appl Math, Dhanbad 826004, Bihar, India
[2] Indian Inst Sci IISc, Elect Engn, Bangalore 560012, Karnataka, India
关键词
Articulatory video synthesis; spoken language training; concatenative synthesis; real-time MRI videos; SPEECH;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Spoken language training benefits from showing a video of native speakers' articulatory movements to train the second language learners. Typically, the articulatory video is prepared in conjunction with the audio which is collected simultaneously with the articulatory recording. Articulatory video recording requires specialized equipment and, hence, is expensive and time consuming. In this work, we propose a concatenative synthesis approach to obtain articulatory videos for an audio, which may not have a simultaneous articulatory recording. In the training stage of the proposed approach, we make a repository for phoneme specific articulatory image sequence from the available articulatory video. During testing, image sequences are selected from this repository to ensure a smooth transition across phonetic events. The selected image sequences are finally stitched to synthesize the articulatory video for the test audio. Articulatory videos are synthesized for 50 words randomly selected from the MRI-TIMIT database, not seen in the training data. Subjective evaluation on the quality of the synthesized videos using twelve subjects suggests that the videos are close to the original ones with a rating of 3.78 out of 5, where a score of 5 (1) indicates that there is no (great) difference in quality between the original and the synthesized videos.
引用
收藏
页码:4999 / 5003
页数:5
相关论文
共 50 条
  • [1] Automatic visual augmentation for concatenation based synthesized articulatory videos from real-time MRI data for spoken language training
    Yarra, Chandana S. Chiranjeevi
    Aggarwal, Ritu
    Mittal, Sanjeev Kumar
    Kausthubha, N. K.
    Raseena, K. T.
    Singh, Astha
    Ghosh, Prasanta Kumar
    19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 3127 - 3131
  • [2] A Real-Time MRI Study of Articulatory Setting in Second Language Speech
    Benitez, Andres
    Ramanarayanan, Vikram
    Goldstein, Louis
    Narayanan, Shrikanth
    15TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2014), VOLS 1-4, 2014, : 701 - 705
  • [3] An Articulatory Analysis of Phonological Transfer Using Real-Time MRI
    Tepperman, Joseph
    Bresch, Erik
    Kim, Yoon-Chul
    Lee, Sungbok
    Goldstein, Louis
    Narayanan, Shrikanth
    INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 688 - 691
  • [4] Real-time MRI and articulatory coordination in speech
    Demolin, D
    Hassid, S
    Metens, T
    Soquet, A
    COMPTES RENDUS BIOLOGIES, 2002, 325 (04) : 547 - 556
  • [5] Speech Synthesis from Articulatory Movements Recorded by Real-time MRI
    Otani, Yuto
    Sawada, Shun
    Ohmura, Hidefumi
    Katsurada, Kouichi
    INTERSPEECH 2023, 2023, : 127 - 131
  • [6] Using Transposed Convolution for Articulatory-to-Acoustic Conversion from Real-Time MRI Data
    Tanji, Ryo
    Ohmura, Hidefumi
    Katsurada, Kouichi
    INTERSPEECH 2021, 2021, : 3176 - 3180
  • [7] Articulatory Data Recorder: A Framework for Real-Time Articulatory Data Recording
    Wilbrandt, Alexander
    Stone, Simon
    Birkholz, Peter
    INTERSPEECH 2021, 2021, : 3313 - 3314
  • [8] Articulatory Synthesis based on Real-Time Magnetic Resonance Imaging Data
    Toutios, Asterios
    Sorensen, Tanner
    Somandepalli, Krishna
    Alexander, Rachel
    Narayanan, Shrikanth
    17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 1492 - 1496
  • [9] Real-Time Spoken Language Understanding for Orthopedic Clinical Training in Virtual Reality
    Ng, Han Wei
    Koh, Aiden
    Foong, Anthea
    Ong, Jeremy
    Tan, Jun Hao
    Khoo, Eng Tat
    Liu, Gabriel
    ARTIFICIAL INTELLIGENCE IN EDUCATION, PT I, 2022, 13355 : 640 - 646
  • [10] Real-time MRI articulatory movement database and its application to articulatory phonetics
    Maekawa, Kikuo
    Acoustical Science and Technology, 46 (01): : 45 - 54