Lip-Reading Classification of Turkish Digits Using Ensemble Learning Architecture Based on 3DCNN

被引:0
|
作者
Erbey, Ali [1 ,2 ]
Barisci, Necaattin [3 ]
机构
[1] Usak Univ, Distance Educ Vocat Sch, Dept Comp Programming, TR-64200 Usak, Turkiye
[2] Gazi Univ, Informat Inst, Informat Syst, TR-06560 Ankara, Turkiye
[3] Gazi Univ, Fac Technol, Dept Comp Engn, TR-06560 Ankara, Turkiye
来源
APPLIED SCIENCES-BASEL | 2025年 / 15卷 / 02期
关键词
lip-reading; ensemble learning; 3DCNN; RECOGNITION;
D O I
10.3390/app15020563
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
Understanding others correctly is of great importance for maintaining effective communication. Factors such as hearing difficulties or environmental noise can disrupt this process. Lip reading offers an effective solution to these challenges. With the growing success of deep learning architectures, research on lip reading has gained momentum. The aim of this study is to create a lip reading dataset for Turkish digit recognition and to conduct predictive analyses. The dataset has divided into two subsets: the face region and the lip region. CNN, LSTM, and 3DCNN-based models, including C3D, I3D, and 3DCNN+BiLSTM, were used. While LSTM models are effective in processing temporal data, 3DCNN-based models, which can process both spatial and temporal information, achieved higher accuracy in this study. Experimental results showed that the dataset containing only the lip region performed better; accuracy rates for CNN, LSTM, C3D, and I3D on the lip region were 67.12%, 75.53%, 86.32%, and 93.24%, respectively. The 3DCNN-based models achieved higher accuracy due to their ability to process spatio-temporal data. Furthermore, an additional 1.23% improvement was achieved through ensemble learning, with the best result reaching 94.53% accuracy. Ensemble learning, by combining the strengths of different models, provided a meaningful improvement in overall performance. These results demonstrate that 3DCNN architectures and ensemble learning methods yield high success in addressing the problem of lip reading in the Turkish language. While our study focuses on Turkish digit recognition, the proposed methods have the potential to be successful in other languages or broader lip reading applications.
引用
收藏
页数:23
相关论文
共 50 条
  • [1] Human Activity Classification Using the 3DCNN Architecture
    Vrskova, Roberta
    Hudec, Robert
    Kamencay, Patrik
    Sykora, Peter
    APPLIED SCIENCES-BASEL, 2022, 12 (02):
  • [2] Turkish lip-reading using Bi-LSTM and deep learning models
    Atila, Uemit
    Sabaz, Furkan
    ENGINEERING SCIENCE AND TECHNOLOGY-AN INTERNATIONAL JOURNAL-JESTECH, 2022, 35
  • [3] Macro-cuboid based probabilistic matching for lip-reading digits
    Pachoud, Samuel
    Gong, Shaogang
    Cavallaro, Andrea
    2008 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, VOLS 1-12, 2008, : 3056 - 3063
  • [4] Deep Lip Reading - A Deep Learning Based Lip-Reading Software for the Hearing Impaired
    Abrar, Mohammed Abid
    Islam, A. N. M. Nafiul
    Hassan, Mohammad Muntasir
    Islam, Mohammad Tariqul
    Shahnaz, Celia
    Fattah, Shaikh Anowarul
    PROCEEDINGS OF 2019 IEEE R10 HUMANITARIAN TECHNOLOGY CONFERENCE (IEEE R10 HTC 2019), 2019, : 40 - 44
  • [5] Deep Learning-Based Automated Lip-Reading: A Survey
    Fenghour, Souheil
    Chen, Daqing
    Guo, Kun
    Li, Bo
    Xiao, Perry
    IEEE ACCESS, 2021, 9 (09): : 121184 - 121205
  • [6] Eulerian Motion Based 3DCNN Architecture for Facial Micro-Expression Recognition
    Wang, Yahui
    Ma, Huimin
    Xing, Xinpeng
    Pan, Zeyu
    MULTIMEDIA MODELING (MMM 2020), PT I, 2020, 11961 : 266 - 277
  • [7] A Comprehensive Dataset for Machine-Learning-based Lip-Reading Algorithm
    Ting, Jin
    Song, Chai
    Huang, Hongyang
    Tian, Taoling
    8TH INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY AND QUANTITATIVE MANAGEMENT (ITQM 2020 & 2021): DEVELOPING GLOBAL DIGITAL ECONOMY AFTER COVID-19, 2022, 199 : 1444 - 1449
  • [8] Learning Spatiotemporal Features using 3DCNN and Convolutional LSTM for Gesture Recognition
    Zhang, Liang
    Zhu, Guangming
    Shen, Peiyi
    Song, Juan
    Shah, Syed Afaq
    Bennamoun, Mohammed
    2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW 2017), 2017, : 3120 - 3128
  • [9] Automatic lip-reading classification using deep learning approaches and optimized quaternion meixner moments by GWO algorithm
    El Ogri, Omar
    EL-Mekkaoui, Jaouad
    Benslimane, Mohamed
    Hjouji, Amal
    KNOWLEDGE-BASED SYSTEMS, 2024, 304
  • [10] End-to-End Lip-Reading Open Cloud-Based Speech Architecture
    Jeon, Sanghun
    Kim, Mun Sang
    SENSORS, 2022, 22 (08)