Automatic Lip-Reading System Based on Deep Convolutional Neural Network and Attention-Based Long Short-Term Memory

被引:26
|
作者
Lu, Yuanyao [1 ]
Li, Hongbo [1 ]
机构
[1] North China Univ Technol, Sch Informat Sci & Technol, Beijing 100144, Peoples R China
来源
APPLIED SCIENCES-BASEL | 2019年 / 9卷 / 08期
基金
中国国家自然科学基金; 北京市自然科学基金;
关键词
virtual reality (VR); self-attention; automatic lip-reading; sensory input; deep learning;
D O I
10.3390/app9081599
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
With the improvement of computer performance, virtual reality (VR) as a new way of visual operation and interaction method gives the automatic lip-reading technology based on visual features broad development prospects. In an immersive VR environment, the user's state can be successfully captured through lip movements, thereby analyzing the user's real-time thinking. Due to complex image processing, hard-to-train classifiers and long-term recognition processes, the traditional lip-reading recognition system is difficult to meet the requirements of practical applications. In this paper, the convolutional neural network (CNN) used to image feature extraction is combined with a recurrent neural network (RNN) based on attention mechanism for automatic lip-reading recognition. Our proposed method for automatic lip-reading recognition can be divided into three steps. Firstly, we extract keyframes from our own established independent database (English pronunciation of numbers from zero to nine by three males and three females). Then, we use the Visual Geometry Group (VGG) network to extract the lip image features. It is found that the image feature extraction results are fault-tolerant and effective. Finally, we compare two lip-reading models: (1) a fusion model with an attention mechanism and (2) a fusion model of two networks. The results show that the accuracy of the proposed model is 88.2% in the test dataset and 84.9% for the contrastive model. Therefore, our proposed method is superior to the traditional lip-reading recognition methods and the general neural networks.
引用
收藏
页数:12
相关论文
共 50 条
  • [1] Automated Lip-Reading Robotic System Based on Convolutional Neural Network and Long Short-Term Memory
    Gholipour, Amir
    Taheri, Alireza
    Mohammadzade, Hoda
    [J]. SOCIAL ROBOTICS, ICSR 2021, 2021, 13086 : 73 - 84
  • [2] Attention-based convolutional neural network and long short-term memory for short-term detection of mood disorders based on elicited speech responses
    Huang, Kun-Yi
    Wu, Chung-Hsien
    Su, Ming-Hsiang
    [J]. PATTERN RECOGNITION, 2019, 88 : 668 - 678
  • [3] A Deep Neural Network Model for Short-Term Load Forecast Based on Long Short-Term Memory Network and Convolutional Neural Network
    Tian, Chujie
    Ma, Jian
    Zhang, Chunhong
    Zhan, Panpan
    [J]. ENERGIES, 2018, 11 (12)
  • [4] Automatic Lip Reading Using Convolution Neural Network and Bidirectional Long Short-term Memory
    Lu, Yuanyao
    Yan, Jie
    [J]. INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE, 2020, 34 (01)
  • [5] Short-Term Traffic Congestion Forecasting Using Attention-Based Long Short-Term Memory Recurrent Neural Network
    Zhang, Tianlin
    Liu, Ying
    Cui, Zhenyu
    Leng, Jiaxu
    Xie, Weihong
    Zhang, Liang
    [J]. COMPUTATIONAL SCIENCE - ICCS 2019, PT III, 2019, 11538 : 304 - 314
  • [6] Attention-based long short-term memory fully convolutional network for chemical process fault diagnosis
    Xiong, Shanwei
    Zhou, Li
    Dai, Yiyang
    Ji, Xu
    [J]. CHINESE JOURNAL OF CHEMICAL ENGINEERING, 2023, 56 : 1 - 14
  • [7] Attention-based long short-term memory fully convolutional network for chemical process fault diagnosis
    Shanwei Xiong
    Li Zhou
    Yiyang Dai
    Xu Ji
    [J]. Chinese Journal of Chemical Engineering, 2023, 56 (04) : 1 - 14
  • [8] Speech emotion recognition based on convolutional neural network with attention-based bidirectional long short-term memory network and multi-task learning
    Liu, Zhen-Tao
    Han, Meng-Ting
    Wu, Bao-Han
    Rehman, Abdul
    [J]. APPLIED ACOUSTICS, 2023, 202
  • [9] Hybrid attention-based Long Short-Term Memory network for sarcasm identification
    Pandey, Rajnish
    Kumar, Abhinav
    Singh, Jyoti Prakash
    Tripathi, Sudhakar
    [J]. APPLIED SOFT COMPUTING, 2021, 106
  • [10] Attention-based long short-term memory network temperature prediction model
    Kun, Xiao
    Shan, Tian
    Yi, Tan
    Chao, Chen
    [J]. PROCEEDINGS OF 2021 7TH INTERNATIONAL CONFERENCE ON CONDITION MONITORING OF MACHINERY IN NON-STATIONARY OPERATIONS (CMMNO), 2021, : 278 - 281