Inter-speaker synchronization in audiovisual database for lip-readable speech to animation conversion

被引:0
|
作者
Feldhoffer, Gergely [1 ]
Oroszi, Balazs [1 ]
Takacs, Gyoergy [1 ]
Tihanyi, Attila [1 ]
Bardi, Tamas [1 ]
机构
[1] Peter Pazmany Catholic Univ, Fac Informat Technol, Budapest, Hungary
来源
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The present study proposes an inter-speaker audiovisual synchronization method to decrease the speaker dependency of our direct speech to animation conversion system. Our aim is to convert an everyday speaker's voice to lip-readable facial animation for hearing impaired users. This conversion needs mixed training data: acoustic features from normal speakers coupled with visual features from professional lip-speakers. Audio and video data of normal and professional speakers were synchronized with Dynamic Time Warping method. Quality and usefulness of the synchronization were investigated in subjective test with measuring noticeable conflicts between the audio and visual part of speech stimuli. An objective test was done also, training neural network on the synchronized audiovisual data with increasing number of speakers.
引用
收藏
页码:447 / 454
页数:8
相关论文
共 18 条
  • [1] Database construction for speech to lip-readable animation conversion
    acs, Gyorgy Ta
    Tihanyi, Atilla
    Bardi, Tamas
    Feldhoffer, Gergo
    Srancsi, Balint
    [J]. PROCEEDINGS ELMAR-2006, 2006, : 151 - +
  • [2] Modeling inter-speaker variability in speech recognition
    Cloarec, Gwenael
    Jouvet, Denis
    [J]. 2008 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-12, 2008, : 4529 - 4532
  • [3] Audiovisual speech processing - Lip reading and lip synchronization
    Chen, TH
    [J]. IEEE SIGNAL PROCESSING MAGAZINE, 2001, 18 (01) : 9 - 21
  • [4] Audiovisual Speaker Identification Based on Lip and Speech Modalities
    Chelali, Fatma
    Djeradi, Amar
    [J]. INTERNATIONAL ARAB JOURNAL OF INFORMATION TECHNOLOGY, 2017, 14 (01) : 99 - 110
  • [5] Studies on inter-speaker variability in speech and its application in automatic speech recognition
    S UMESH
    [J]. Sadhana, 2011, 36 : 853 - 883
  • [6] Studies on inter-speaker variability in speech and its application in automatic speech recognition
    Umesh, S.
    [J]. SADHANA-ACADEMY PROCEEDINGS IN ENGINEERING SCIENCES, 2011, 36 (05): : 853 - 883
  • [7] Inter-speaker variability: speaker normalisation and quantitative estimation of articulatory invariants in speech production for French
    Serrurier, Antoine
    Badin, Pierre
    Boe, Louis-Jean
    Lamalle, Laurent
    Neuschaefer-Rube, Christiane
    [J]. 18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 2272 - 2276
  • [8] Intra-speaker and inter-speaker variability in speech sound pressure level across repeated readings
    Castellana, Antonella
    Carullo, Alessio
    Astolfi, Arianna
    Puglisi, Giuseppina Emma
    Fugiglando, Umberto
    [J]. JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2017, 141 (04): : 2353 - 2363
  • [9] Intra-speaker phonetic variation in read speech: comparison with inter-speaker variability in a controlled population
    Audibert, Nicolas
    Fougeronl, Cecile
    [J]. INTERSPEECH 2022, 2022, : 4755 - 4759
  • [10] Voice conversion based on probabilistic parameter transformation and extended inter-speaker residual prediction
    Hanzlicek, Zdenek
    Matousek, Jindrich
    [J]. TEXT, SPEECH AND DIALOGUE, PROCEEDINGS, 2007, 4629 : 480 - 487