Lip-Reading Classification of Turkish Digits Using Ensemble Learning Architecture Based on 3DCNN

被引:0
|
作者
Erbey, Ali [1 ,2 ]
Barisci, Necaattin [3 ]
机构
[1] Usak Univ, Distance Educ Vocat Sch, Dept Comp Programming, TR-64200 Usak, Turkiye
[2] Gazi Univ, Informat Inst, Informat Syst, TR-06560 Ankara, Turkiye
[3] Gazi Univ, Fac Technol, Dept Comp Engn, TR-06560 Ankara, Turkiye
来源
APPLIED SCIENCES-BASEL | 2025年 / 15卷 / 02期
关键词
lip-reading; ensemble learning; 3DCNN; RECOGNITION;
D O I
10.3390/app15020563
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
Understanding others correctly is of great importance for maintaining effective communication. Factors such as hearing difficulties or environmental noise can disrupt this process. Lip reading offers an effective solution to these challenges. With the growing success of deep learning architectures, research on lip reading has gained momentum. The aim of this study is to create a lip reading dataset for Turkish digit recognition and to conduct predictive analyses. The dataset has divided into two subsets: the face region and the lip region. CNN, LSTM, and 3DCNN-based models, including C3D, I3D, and 3DCNN+BiLSTM, were used. While LSTM models are effective in processing temporal data, 3DCNN-based models, which can process both spatial and temporal information, achieved higher accuracy in this study. Experimental results showed that the dataset containing only the lip region performed better; accuracy rates for CNN, LSTM, C3D, and I3D on the lip region were 67.12%, 75.53%, 86.32%, and 93.24%, respectively. The 3DCNN-based models achieved higher accuracy due to their ability to process spatio-temporal data. Furthermore, an additional 1.23% improvement was achieved through ensemble learning, with the best result reaching 94.53% accuracy. Ensemble learning, by combining the strengths of different models, provided a meaningful improvement in overall performance. These results demonstrate that 3DCNN architectures and ensemble learning methods yield high success in addressing the problem of lip reading in the Turkish language. While our study focuses on Turkish digit recognition, the proposed methods have the potential to be successful in other languages or broader lip reading applications.
引用
收藏
页数:23
相关论文
共 50 条
  • [11] Visual Lip-Reading for Quranic Arabic Alphabets and Words Using Deep Learning
    Aljohani N.F.
    Jaha E.S.
    Computer Systems Science and Engineering, 2023, 46 (03): : 3037 - 3058
  • [12] Human Activity Recognition using Temporal 3DCNN based on FMCW Radar
    Chen, Haoyu
    Ding, Chuanwei
    Zhang, Li
    Hong, Hong
    Zhu, Xiaohua
    2022 IEEE MTT-S INTERNATIONAL MICROWAVE BIOMEDICAL CONFERENCE (IMBIOC), 2022, : 245 - 247
  • [13] Mobile Device-based Speech Enhancement System Using Lip-reading
    Matsunaga, Yuta
    Matsui, Kenji
    2018 IEEE INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE IN ENGINEERING AND TECHNOLOGY (IICAIET), 2018, : 13 - 16
  • [14] Mobile device-based speech enhancement system using lip-reading
    Nakahara, Tomonori
    Fukuyama, Kohei
    Hamada, Mitsuru
    Matsui, Kenji
    Nakatoh, Yoshihisa
    Kato, Yumiko O.
    Rivas, Alberto
    Corchado, Juan Manuel
    Advances in Intelligent Systems and Computing, 2021, 1237 AISC : 159 - 167
  • [15] Robust Geometrical-Based Lip-Reading using Hidden Markov Models
    Ibrahim, M. Z.
    Mulvaney, D. J.
    2013 IEEE EUROCON, 2013, : 2011 - 2016
  • [16] Human Action Representation Learning Using an Attention-Driven Residual 3DCNN Network
    Ullah, Hayat
    Munir, Arslan
    ALGORITHMS, 2023, 16 (08)
  • [17] Lip-Reading with Visual Form Classification using Residual Networks and Bidirectional Gated Recurrent Units
    Anni
    Suharjito
    HighTech and Innovation Journal, 2023, 4 (02): : 375 - 386
  • [18] Multi-class Classification of Alzheimer's Disease using 3DCNN Features and Multilayer Perceptron
    Raju, Manu
    Gopi, Varun P.
    Anitha, V. S.
    2021 SIXTH INTERNATIONAL CONFERENCE ON WIRELESS COMMUNICATIONS, SIGNAL PROCESSING AND NETWORKING (WISPNET), 2021, : 368 - 373
  • [19] Lung cancer prediction in chest CT using an active contour based segmentation and 3DCNN
    Parvathy, C. S.
    Jayan, J. P.
    MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 83 (21) : 60493 - 60517
  • [20] Electromyogram-Based Lip-Reading via Unobtrusive Dry Electrodes and Machine Learning Methods
    Dong, Penghao
    Song, Yuanqing
    Yu, Shangyouqiao
    Zhang, Zimeng
    Mallipattu, Sandeep K.
    Djuric, Petar M.
    Yao, Shanshan
    SMALL, 2023, 19 (17)