Inter-speaker synchronization in audiovisual database for lip-readable speech to animation conversion

被引：0

作者：

Feldhoffer, Gergely ^{[1
]}

Oroszi, Balazs ^{[1
]}

Takacs, Gyoergy ^{[1
]}

Tihanyi, Attila ^{[1
]}

Bardi, Tamas ^{[1
]}

机构：

[1] Peter Pazmany Catholic Univ, Fac Informat Technol, Budapest, Hungary

来源：

TEXT, SPEECH AND DIALOGUE, PROCEEDINGS | 2007年 / 4629卷

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The present study proposes an inter-speaker audiovisual synchronization method to decrease the speaker dependency of our direct speech to animation conversion system. Our aim is to convert an everyday speaker's voice to lip-readable facial animation for hearing impaired users. This conversion needs mixed training data: acoustic features from normal speakers coupled with visual features from professional lip-speakers. Audio and video data of normal and professional speakers were synchronized with Dynamic Time Warping method. Quality and usefulness of the synchronization were investigated in subjective test with measuring noticeable conflicts between the audio and visual part of speech stimuli. An objective test was done also, training neural network on the synchronized audiovisual data with increasing number of speakers.

引用

页码：447 / 454

页数：8

共 18 条

[1] Database construction for speech to lip-readable animation conversion
acs, Gyorgy Ta
Tihanyi, Atilla
Bardi, Tamas
Feldhoffer, Gergo
Srancsi, Balint
[J]. PROCEEDINGS ELMAR-2006, 2006, : 151 - +
[2] Modeling inter-speaker variability in speech recognition
Cloarec, Gwenael
Jouvet, Denis
[J]. 2008 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-12, 2008, : 4529 - 4532
[3] Audiovisual speech processing - Lip reading and lip synchronization
Chen, TH
[J]. IEEE SIGNAL PROCESSING MAGAZINE, 2001, 18 (01) : 9 - 21
[4] Audiovisual Speaker Identification Based on Lip and Speech Modalities
Chelali, Fatma
Djeradi, Amar
[J]. INTERNATIONAL ARAB JOURNAL OF INFORMATION TECHNOLOGY, 2017, 14 (01) : 99 - 110
[5] Studies on inter-speaker variability in speech and its application in automatic speech recognition
S UMESH
[J]. Sadhana, 2011, 36 : 853 - 883
[6] Studies on inter-speaker variability in speech and its application in automatic speech recognition
Umesh, S.
[J]. SADHANA-ACADEMY PROCEEDINGS IN ENGINEERING SCIENCES, 2011, 36 (05): : 853 - 883
[7] Inter-speaker variability: speaker normalisation and quantitative estimation of articulatory invariants in speech production for French
Serrurier, Antoine
Badin, Pierre
Boe, Louis-Jean
Lamalle, Laurent
Neuschaefer-Rube, Christiane
[J]. 18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 2272 - 2276
[8] Intra-speaker and inter-speaker variability in speech sound pressure level across repeated readings
Castellana, Antonella
Carullo, Alessio
Astolfi, Arianna
Puglisi, Giuseppina Emma
Fugiglando, Umberto
[J]. JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2017, 141 (04): : 2353 - 2363
[9] Intra-speaker phonetic variation in read speech: comparison with inter-speaker variability in a controlled population
Audibert, Nicolas
Fougeronl, Cecile
[J]. INTERSPEECH 2022, 2022, : 4755 - 4759
[10] Voice conversion based on probabilistic parameter transformation and extended inter-speaker residual prediction
Hanzlicek, Zdenek
Matousek, Jindrich
[J]. TEXT, SPEECH AND DIALOGUE, PROCEEDINGS, 2007, 4629 : 480 - 487

← 1 2 →