Accurate Visual Speech Synthesis Based on Diviseme Unit Selection and Concatenation

被引:0
|
作者
Jiang, Dongmei [1 ]
Ravyse, Ilse [2 ]
Sahli, Hichem [2 ]
Zhang, Yanning [1 ]
机构
[1] Northwestern Polytech Univ, Sch Comp Sci, Joint Res Grp Audio Visual Signal Proc, 127 Youyi Xilu, Xian 710072, Peoples R China
[2] Vrije Univ Brussel, Dept ETRO, B-1050 Brussels, Belgium
基金
中国国家自然科学基金;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper presents a novel speech driven accurate realistic visual speech synthesis approach. Firstly, an audio visual instance database is built for different viseme context combinations, i.e. diviseme units, using 100 audio visual speech sentences of a female speaker. Then a diviseme instance selection algorithm is introduced to choose the optimal diviseme instances for the viseme contexts in the input speech, considering both the concatenation smoothness of the image sequences, and matching of the mouth movements to the acoustic pronunciation process, as well the intensity of the input speech. Finally mouth image sequences of corresponding viseme segments in the selected diviseme instances are time warped and blended to construct the mouth images of the final animation. Visual speech synthesis experiments and subjective evaluation results show that mouth animations can he obtained which are not only realistic with clear and smooth mouth images, but also in good accordance with the acoustic pronunciation and intensity of the input speech.
引用
收藏
页码:910 / +
页数:2
相关论文
共 50 条
  • [1] Accurate automatic visible speech synthesis of arbitrary 3D models based on concatenation of diviseme motion capture data
    Ma, JY
    Cole, R
    Pellom, B
    Ward, W
    Wise, B
    [J]. COMPUTER ANIMATION AND VIRTUAL WORLDS, 2004, 15 (05) : 485 - 500
  • [2] Triphone based unit selection for concatenative visual speech synthesis
    Huang, FJ
    Cosatto, E
    Graf, HP
    [J]. 2002 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I-IV, PROCEEDINGS, 2002, : 2037 - 2040
  • [3] Realistic Visual Speech Synthesis Based on Hybrid Concatenation Method
    Tao, Jianhua
    Xin, Le
    Yin, Panrong
    [J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2009, 17 (03): : 469 - 477
  • [4] Unit selection for speech synthesis based on acoustic criteria
    Rouibia, S
    Rosec, O
    Moudenc, T
    [J]. TEXT, SPEECH AND DIALOGUE, PROCEEDINGS, 2005, 3658 : 281 - 287
  • [5] COMPRESSED SENSING FOR UNIT SELECTION BASED SPEECH SYNTHESIS
    Sharma, Pulkit
    Abrol, Vinayak
    Sao, Anil Kumar
    [J]. 2015 23RD EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2015, : 1731 - 1735
  • [6] Speech Processing for Arabic Speech Synthesis Based on Concatenation Rules
    Imedjdouben F.
    [J]. SN Computer Science, 5 (3)
  • [7] A NOVEL UNIT SELECTION METHOD FOR CONCATENATION SPEECH SYSTEM USING SIMILARITY MEASURE
    Zhang, Ran
    Tao, Jianhua
    Li, Ya
    Wen, Zhengqi
    [J]. 2013 INTERNATIONAL CONFERENCE ORIENTAL COCOSDA HELD JOINTLY WITH 2013 CONFERENCE ON ASIAN SPOKEN LANGUAGE RESEARCH AND EVALUATION (O-COCOSDA/CASLRE), 2013,
  • [8] Minimized Database of Unit Selection in Visual Speech Synthesis Without Loss of Naturalness
    Liu, Kang
    Ostermann, Joern
    [J]. COMPUTER ANALYSIS OF IMAGES AND PATTERNS, PROCEEDINGS, 2009, 5702 : 1212 - 1219
  • [9] Context Features Based Pre-Selection and Weight Prediction in Concatenation Speech Synthesis System
    Liu, Shanfeng
    Wen, Zhengqi
    Li, Ya
    Tao, Jianghua
    Liu, Bin
    [J]. 2014 9TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2014, : 506 - 510
  • [10] A Wavelet Based Concatenation Algorithm for Gujarati Speech Synthesis
    Gujarathi, Priyanka Vishwas
    Patil, Sandip Raosaheb
    [J]. HELIX, 2020, 10 (05): : 38 - 43