IMPROVING CNN-BASED VISEME RECOGNITION USING SYNTHETIC DATA

被引:0
|
作者
Mattos, Andrea Britto [1 ]
Borges Oliveira, Dario Augusto [1 ]
Morais, Edmilson da Silva [1 ]
机构
[1] IBM Res, Rua Tutoia 1157, Sao Paulo, Brazil
关键词
Image recognition; Speech recognition; Computer graphics; Machine learning;
D O I
暂无
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Recently, Deep Learning-based methods have obtained high accuracy for the problem of Visual Speech Recognition. However, while good results have been reported for words and sentences, recognizing shorter segments of speech, like phones, has proven to be much more challenging due to the lack of temporal and contextual information. In this work, we address the problem of recognizing visemes, that are the visual equivalent of phonemes - the smallest distinguishable sound unit in a spoken word. Viseme recognition has application in tasks such as lip synchronization, but acquiring and labeling a viseme dataset is complex and time-consuming We tackle this problem by creating a large-scale synthetic 2D dataset based on realistic 3D facial models, automatically labelled. Then, we extract real viseme images from the GRID corpus - using audio data to locate phonemes via forced phonetic alignment and the registered video to extract the corresponding visemes - and evaluate the applicability of the synthetic dataset for recognizing real-world data.
引用
收藏
页数:6
相关论文
共 50 条
  • [1] A Method for Improving CNN-Based Image Recognition Using DCGAN
    Fang, Wei
    Zhang, Feihong
    Sheng, Victor S.
    Ding, Yewen
    [J]. CMC-COMPUTERS MATERIALS & CONTINUA, 2018, 57 (01): : 167 - 178
  • [2] Improving CNN-based activity recognition by data augmentation and transfer learning
    Kalouris, Gerasimos
    Zacharaki, Evangelia I.
    Megalooikonomou, Vasileios
    [J]. 2019 IEEE 17TH INTERNATIONAL CONFERENCE ON INDUSTRIAL INFORMATICS (INDIN), 2019, : 1387 - 1394
  • [3] Improving CNN-based solutions for emotion recognition using evolutionary algorithms
    Mohammadrezaei, Parsa
    Aminan, Mohammad
    Soltanian, Mohammad
    Borna, Keivan
    [J]. RESULTS IN APPLIED MATHEMATICS, 2023, 18
  • [4] CNN-based Note Onset Detection using Synthetic Data Augmentation
    Mounir, Mina
    Karsmakers, Peter
    van Waterschoot, Toon
    [J]. 28TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO 2020), 2021, : 171 - 175
  • [5] CNN-based data augmentation for handwritten gurumukhi text recognition
    Sareen, Bhavna
    Ahuja, Rakesh
    Singh, Amitoj
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2024, 83 (28) : 71035 - 71053
  • [6] Combining Data Augmentations for CNN-Based Voice Command Recognition
    Azarang, Arian
    Hansen, John
    Kehtarnavaz, Nasser
    [J]. 2019 12TH INTERNATIONAL CONFERENCE ON HUMAN SYSTEM INTERACTION (HSI), 2019, : 17 - 21
  • [7] TOWARDS VIEW-INDEPENDENT VISEME RECOGNITION BASED ON CNNS AND SYNTHETIC DATA
    Mattos, Andrea Britto
    Oliveira, Dario Augusto Borges
    Morais, Edmilson da Silva
    [J]. 2018 25TH IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2018, : 589 - 593
  • [8] PRATIT: a CNN-based emotion recognition system using histogram equalization and data augmentation
    Dhara Mungra
    Anjali Agrawal
    Priyanka Sharma
    Sudeep Tanwar
    Mohammad S. Obaidat
    [J]. Multimedia Tools and Applications, 2020, 79 : 2285 - 2307
  • [9] PRATIT: a CNN-based emotion recognition system using histogram equalization and data augmentation
    Mungra, Dhara
    Agrawal, Anjali
    Sharma, Priyanka
    Tanwar, Sudeep
    Obaidat, Mohammad S.
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2020, 79 (3-4) : 2285 - 2307
  • [10] Hybrid Facial Emotion Recognition Using CNN-Based Features
    Shahzad, H. M.
    Bhatti, Sohail Masood
    Jaffar, Arfan
    Akram, Sheeraz
    Alhajlah, Mousa
    Mahmood, Awais
    [J]. APPLIED SCIENCES-BASEL, 2023, 13 (09):