Lipreading model based on a two-way convolutional neural network and feature fusion

被引：1

作者：

Zhu, Meili ^{[1
]}

Wang, Qingqing ^{[2
]}

Ge, Yingying ^{[1
]}

机构：

[1] Jilin Animat Inst, Sch Game, Changchun, Peoples R China

[2] Jilin Animat Inst, Sch Animat Art, Changchun, Peoples R China

来源：

JOURNAL OF ELECTRONIC IMAGING | 2021年 / 30卷 / 06期

关键词：

visual speech recognition; bidirectional dynamic image; histogram of oriented gradients; convolutional neural network; RECOGNITION; CLASSIFICATION; IMAGE;

D O I：

10.1117/1.JEI.30.6.063003

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Lipreading feature extraction is essentially the feature extraction of continuous video frame sequences. A lipreading model based on a two-way convolutional neural network and features is proposed to obtain more reasonable visual-spatial-temporal characteristics. Unlike other lipreading methods based on deep learning, the rank pooling method transforms lip video into a standard RGB image that can be directly input into the convolutional neural network, which effectively reduces the input dimension. In addition, to compensate for the lack of spatial information, the apparent shape and depth features are fused, and then the joint cost function is used to guide the network model learning to obtain more distinguishing features. The experimental results were evaluated on the public GRID database and OuluVS2 database. It shows that the accuracy of the proposed method can reach more than 93%, which validates the effectiveness of the method. (C) 2021 SPIE and IS&T

引用

页数：14

共 50 条

[1] TCNN: Two-Way Convolutional Neural Network for Image Steganalysis
Chen, Zhili
Yang, Baohua
Wu, Fuhu
Ren, Shuai
Zhong, Hong
[J]. SECURITY AND PRIVACY IN COMMUNICATION NETWORKS (SECURECOMM 2020), PT I, 2020, 335 : 509 - 514
[2] Lipreading using convolutional neural network
Noda, Kuniaki
Yamaguchi, Yuki
Nakadai, Kazuhiro
Okuno, Hiroshi G.
Ogata, Tetsuya
[J]. Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, 2014, : 1149 - 1153
[3] Lipreading using Convolutional Neural Network
Noda, Kuniaki
Yamaguchi, Yuki
Nakadai, Kazuhiro
Okuno, Hiroshi G.
Ogata, Tetsuya
[J]. 15TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2014), VOLS 1-4, 2014, : 1149 - 1153
[4] Jamming Recognition Based on Feature Fusion and Convolutional Neural Network
Sitian Liu
Chunli Zhu
[J]. Journal of Beijing Institute of Technology, 2022, (02) : 169 - 177
[5] A Convolutional Neural Network Based on Feature Fusion for Face Recognition
Wang Jiaxin
Lei Zhichun
[J]. LASER & OPTOELECTRONICS PROGRESS, 2020, 57 (10)
[6] Feature Fusion Based on Convolutional Neural Network for SAR ATR
Chen, Shi-Qi
Zhan, Rong-Hui
Hu, Jie-Min
Zhang, Jun
[J]. 4TH ANNUAL INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY AND APPLICATIONS (ITA 2017), 2017, 12
[7] Jamming Recognition Based on Feature Fusion and Convolutional Neural Network
Liu, Sitian
Zhu, Chunli
[J]. Journal of Beijing Institute of Technology (English Edition), 2022, 31 (02): : 169 - 177
[8] A convolutional neural network and classical moments-based feature fusion model for gesture recognition
Abul Abbas Barbhuiya
Ram Kumar Karsh
Rahul Jain
[J]. Multimedia Systems, 2022, 28 : 1779 - 1792
[9] A High-Stability Diagnosis Model Based on a Multiscale Feature Fusion Convolutional Neural Network
Wang, Pengxin
Song, Liuyang
Guo, Xudong
Wang, Huaqing
Cui, Lingli
[J]. IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, 2021, 70
[10] A convolutional neural network and classical moments-based feature fusion model for gesture recognition
Barbhuiya, Abul Abbas
Karsh, Ram Kumar
Jain, Rahul
[J]. MULTIMEDIA SYSTEMS, 2022, 28 (05) : 1779 - 1792

← 1 2 3 4 5 →