Lipreading model based on a two-way convolutional neural network and feature fusion

被引:1
|
作者
Zhu, Meili [1 ]
Wang, Qingqing [2 ]
Ge, Yingying [1 ]
机构
[1] Jilin Animat Inst, Sch Game, Changchun, Peoples R China
[2] Jilin Animat Inst, Sch Animat Art, Changchun, Peoples R China
关键词
visual speech recognition; bidirectional dynamic image; histogram of oriented gradients; convolutional neural network; RECOGNITION; CLASSIFICATION; IMAGE;
D O I
10.1117/1.JEI.30.6.063003
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Lipreading feature extraction is essentially the feature extraction of continuous video frame sequences. A lipreading model based on a two-way convolutional neural network and features is proposed to obtain more reasonable visual-spatial-temporal characteristics. Unlike other lipreading methods based on deep learning, the rank pooling method transforms lip video into a standard RGB image that can be directly input into the convolutional neural network, which effectively reduces the input dimension. In addition, to compensate for the lack of spatial information, the apparent shape and depth features are fused, and then the joint cost function is used to guide the network model learning to obtain more distinguishing features. The experimental results were evaluated on the public GRID database and OuluVS2 database. It shows that the accuracy of the proposed method can reach more than 93%, which validates the effectiveness of the method. (C) 2021 SPIE and IS&T
引用
收藏
页数:14
相关论文
共 50 条
  • [1] TCNN: Two-Way Convolutional Neural Network for Image Steganalysis
    Chen, Zhili
    Yang, Baohua
    Wu, Fuhu
    Ren, Shuai
    Zhong, Hong
    [J]. SECURITY AND PRIVACY IN COMMUNICATION NETWORKS (SECURECOMM 2020), PT I, 2020, 335 : 509 - 514
  • [2] Lipreading using convolutional neural network
    Noda, Kuniaki
    Yamaguchi, Yuki
    Nakadai, Kazuhiro
    Okuno, Hiroshi G.
    Ogata, Tetsuya
    [J]. Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, 2014, : 1149 - 1153
  • [3] Lipreading using Convolutional Neural Network
    Noda, Kuniaki
    Yamaguchi, Yuki
    Nakadai, Kazuhiro
    Okuno, Hiroshi G.
    Ogata, Tetsuya
    [J]. 15TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2014), VOLS 1-4, 2014, : 1149 - 1153
  • [4] Jamming Recognition Based on Feature Fusion and Convolutional Neural Network
    Sitian Liu
    Chunli Zhu
    [J]. Journal of Beijing Institute of Technology, 2022, (02) : 169 - 177
  • [5] A Convolutional Neural Network Based on Feature Fusion for Face Recognition
    Wang Jiaxin
    Lei Zhichun
    [J]. LASER & OPTOELECTRONICS PROGRESS, 2020, 57 (10)
  • [6] Feature Fusion Based on Convolutional Neural Network for SAR ATR
    Chen, Shi-Qi
    Zhan, Rong-Hui
    Hu, Jie-Min
    Zhang, Jun
    [J]. 4TH ANNUAL INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY AND APPLICATIONS (ITA 2017), 2017, 12
  • [7] Jamming Recognition Based on Feature Fusion and Convolutional Neural Network
    Liu, Sitian
    Zhu, Chunli
    [J]. Journal of Beijing Institute of Technology (English Edition), 2022, 31 (02): : 169 - 177
  • [8] A convolutional neural network and classical moments-based feature fusion model for gesture recognition
    Abul Abbas Barbhuiya
    Ram Kumar Karsh
    Rahul Jain
    [J]. Multimedia Systems, 2022, 28 : 1779 - 1792
  • [9] A High-Stability Diagnosis Model Based on a Multiscale Feature Fusion Convolutional Neural Network
    Wang, Pengxin
    Song, Liuyang
    Guo, Xudong
    Wang, Huaqing
    Cui, Lingli
    [J]. IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, 2021, 70
  • [10] A convolutional neural network and classical moments-based feature fusion model for gesture recognition
    Barbhuiya, Abul Abbas
    Karsh, Ram Kumar
    Jain, Rahul
    [J]. MULTIMEDIA SYSTEMS, 2022, 28 (05) : 1779 - 1792