IMPROVING CNN-BASED VISEME RECOGNITION USING SYNTHETIC DATA

被引:0
|
作者
Mattos, Andrea Britto [1 ]
Borges Oliveira, Dario Augusto [1 ]
Morais, Edmilson da Silva [1 ]
机构
[1] IBM Res, Rua Tutoia 1157, Sao Paulo, Brazil
关键词
Image recognition; Speech recognition; Computer graphics; Machine learning;
D O I
暂无
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Recently, Deep Learning-based methods have obtained high accuracy for the problem of Visual Speech Recognition. However, while good results have been reported for words and sentences, recognizing shorter segments of speech, like phones, has proven to be much more challenging due to the lack of temporal and contextual information. In this work, we address the problem of recognizing visemes, that are the visual equivalent of phonemes - the smallest distinguishable sound unit in a spoken word. Viseme recognition has application in tasks such as lip synchronization, but acquiring and labeling a viseme dataset is complex and time-consuming We tackle this problem by creating a large-scale synthetic 2D dataset based on realistic 3D facial models, automatically labelled. Then, we extract real viseme images from the GRID corpus - using audio data to locate phonemes via forced phonetic alignment and the registered video to extract the corresponding visemes - and evaluate the applicability of the synthetic dataset for recognizing real-world data.
引用
收藏
页数:6
相关论文
共 50 条
  • [21] CNN-based plant disease recognition using colour space models
    Nain, Shubham
    Mittal, Neha
    Hanmandlu, Madasu
    [J]. INTERNATIONAL JOURNAL OF IMAGE AND DATA FUSION, 2024, 15 (03) : 373 - 386
  • [22] Modulation format recognition using CNN-based transfer learning models
    Mohamed, Safie El-Din Nasr
    Mortada, Bidaa
    Ali, Anas M.
    El-Shafai, Walid
    Khalaf, Ashraf A. M.
    Zahran, O.
    Dessouky, Moawad I.
    El-Rabaie, El-Sayed M.
    El-Samie, Fathi E. Abd
    [J]. OPTICAL AND QUANTUM ELECTRONICS, 2023, 55 (04)
  • [23] Modulation format recognition using CNN-based transfer learning models
    Safie El-Din Nasr Mohamed
    Bidaa Mortada
    Anas M. Ali
    Walid El-Shafai
    Ashraf A. M. Khalaf
    O. Zahran
    Moawad I. Dessouky
    El-Sayed M. El-Rabaie
    Fathi E. Abd El-Samie
    [J]. Optical and Quantum Electronics, 2023, 55
  • [24] Improving CNN-based semantic segmentation on structurally similar data using contrastive graph convolutional networks
    Chen, Ling
    Tang, Zedong
    Li, Hao
    [J]. PATTERN RECOGNITION, 2024, 155
  • [25] CNN-based image steganalysis using additional data embedding
    Kim, Jaeyoung
    Park, Hanhoon
    Park, Jong-Il
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2020, 79 (1-2) : 1355 - 1372
  • [26] CNN-based image steganalysis using additional data embedding
    Jaeyoung Kim
    Hanhoon Park
    Jong-Il Park
    [J]. Multimedia Tools and Applications, 2020, 79 : 1355 - 1372
  • [27] An Effective Data Augmentation Strategy for CNN-Based Pest Localization and Recognition in the Field
    Li, Rui
    Wang, Rujing
    Zhang, Jie
    Xie, Chengjun
    Liu, Liu
    Wang, Fangyuan
    Chen, Hongbo
    Chen, Tianjiao
    Hu, Haiying
    Jia, Xiufang
    Hu, Min
    Zhou, Man
    Li, Dengshan
    Liu, Wancai
    [J]. IEEE ACCESS, 2019, 7 : 160274 - 160283
  • [28] CNN-Based Character Recognition for License Plate Recognition System
    Van Huy Pham
    Phong Quang Dinh
    Van Huan Nguyen
    [J]. INTELLIGENT INFORMATION AND DATABASE SYSTEMS, ACIIDS 2018, PT II, 2018, 10752 : 594 - 603
  • [29] DeepCReg: Improving Cellular-based Outdoor Localization using CNN-based Regressors
    Elawaad, Karim
    Ezzeldin, Mohamed
    Torki, Marwan
    [J]. 2020 IEEE WIRELESS COMMUNICATIONS AND NETWORKING CONFERENCE (WCNC), 2020,
  • [30] Improving CNN-based Person Re-identification using score Normalization
    Chouchane, Ammar
    Ouamane, Abdelmalik
    Himeur, Yassine
    Mansoor, Wathiq
    Atalla, Shadi
    Benzaibak, Afaf
    Boudellal, Chahrazed
    [J]. 2023 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2023, : 2890 - 2894