CONCATENATIVE ARTICULATORY VIDEO SYNTHESIS USING REAL-TIME MRI DATA FOR SPOKEN LANGUAGE TRAINING

被引：0

作者：

Desai, Urvish ^{[1
]}

Yarra, Chiranjeevi ^{[2
]}

Ghosh, Prasanta Kumar ^{[2
]}

机构：

[1] Indian Inst Technol ISM, Appl Math, Dhanbad 826004, Bihar, India

[2] Indian Inst Sci IISc, Elect Engn, Bangalore 560012, Karnataka, India

来源：

2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) | 2018年

关键词：

Articulatory video synthesis; spoken language training; concatenative synthesis; real-time MRI videos; SPEECH;

D O I：

暂无

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Spoken language training benefits from showing a video of native speakers' articulatory movements to train the second language learners. Typically, the articulatory video is prepared in conjunction with the audio which is collected simultaneously with the articulatory recording. Articulatory video recording requires specialized equipment and, hence, is expensive and time consuming. In this work, we propose a concatenative synthesis approach to obtain articulatory videos for an audio, which may not have a simultaneous articulatory recording. In the training stage of the proposed approach, we make a repository for phoneme specific articulatory image sequence from the available articulatory video. During testing, image sequences are selected from this repository to ensure a smooth transition across phonetic events. The selected image sequences are finally stitched to synthesize the articulatory video for the test audio. Articulatory videos are synthesized for 50 words randomly selected from the MRI-TIMIT database, not seen in the training data. Subjective evaluation on the quality of the synthesized videos using twelve subjects suggests that the videos are close to the original ones with a rating of 3.78 out of 5, where a score of 5 (1) indicates that there is no (great) difference in quality between the original and the synthesized videos.

引用

页码：4999 / 5003

页数：5

共 50 条

[1] Automatic visual augmentation for concatenation based synthesized articulatory videos from real-time MRI data for spoken language training
Yarra, Chandana S. Chiranjeevi
Aggarwal, Ritu
Mittal, Sanjeev Kumar
Kausthubha, N. K.
Raseena, K. T.
Singh, Astha
Ghosh, Prasanta Kumar
19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 3127 - 3131
[2] A Real-Time MRI Study of Articulatory Setting in Second Language Speech
Benitez, Andres
Ramanarayanan, Vikram
Goldstein, Louis
Narayanan, Shrikanth
15TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2014), VOLS 1-4, 2014, : 701 - 705
[3] An Articulatory Analysis of Phonological Transfer Using Real-Time MRI
Tepperman, Joseph
Bresch, Erik
Kim, Yoon-Chul
Lee, Sungbok
Goldstein, Louis
Narayanan, Shrikanth
INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 688 - 691
[4] Real-time MRI and articulatory coordination in speech
Demolin, D
Hassid, S
Metens, T
Soquet, A
COMPTES RENDUS BIOLOGIES, 2002, 325 (04) : 547 - 556
[5] Speech Synthesis from Articulatory Movements Recorded by Real-time MRI
Otani, Yuto
Sawada, Shun
Ohmura, Hidefumi
Katsurada, Kouichi
INTERSPEECH 2023, 2023, : 127 - 131
[6] Using Transposed Convolution for Articulatory-to-Acoustic Conversion from Real-Time MRI Data
Tanji, Ryo
Ohmura, Hidefumi
Katsurada, Kouichi
INTERSPEECH 2021, 2021, : 3176 - 3180
[7] Articulatory Data Recorder: A Framework for Real-Time Articulatory Data Recording
Wilbrandt, Alexander
Stone, Simon
Birkholz, Peter
INTERSPEECH 2021, 2021, : 3313 - 3314
[8] Articulatory Synthesis based on Real-Time Magnetic Resonance Imaging Data
Toutios, Asterios
Sorensen, Tanner
Somandepalli, Krishna
Alexander, Rachel
Narayanan, Shrikanth
17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 1492 - 1496
[9] Real-Time Spoken Language Understanding for Orthopedic Clinical Training in Virtual Reality
Ng, Han Wei
Koh, Aiden
Foong, Anthea
Ong, Jeremy
Tan, Jun Hao
Khoo, Eng Tat
Liu, Gabriel
ARTIFICIAL INTELLIGENCE IN EDUCATION, PT I, 2022, 13355 : 640 - 646
[10] Real-time MRI articulatory movement database and its application to articulatory phonetics
Maekawa, Kikuo
Acoustical Science and Technology, 46 (01): : 45 - 54

← 1 2 3 4 5 →