Lip-Reading Classification of Turkish Digits Using Ensemble Learning Architecture Based on 3DCNN

被引:0
|
作者
Erbey, Ali [1 ,2 ]
Barisci, Necaattin [3 ]
机构
[1] Usak Univ, Distance Educ Vocat Sch, Dept Comp Programming, TR-64200 Usak, Turkiye
[2] Gazi Univ, Informat Inst, Informat Syst, TR-06560 Ankara, Turkiye
[3] Gazi Univ, Fac Technol, Dept Comp Engn, TR-06560 Ankara, Turkiye
来源
APPLIED SCIENCES-BASEL | 2025年 / 15卷 / 02期
关键词
lip-reading; ensemble learning; 3DCNN; RECOGNITION;
D O I
10.3390/app15020563
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
Understanding others correctly is of great importance for maintaining effective communication. Factors such as hearing difficulties or environmental noise can disrupt this process. Lip reading offers an effective solution to these challenges. With the growing success of deep learning architectures, research on lip reading has gained momentum. The aim of this study is to create a lip reading dataset for Turkish digit recognition and to conduct predictive analyses. The dataset has divided into two subsets: the face region and the lip region. CNN, LSTM, and 3DCNN-based models, including C3D, I3D, and 3DCNN+BiLSTM, were used. While LSTM models are effective in processing temporal data, 3DCNN-based models, which can process both spatial and temporal information, achieved higher accuracy in this study. Experimental results showed that the dataset containing only the lip region performed better; accuracy rates for CNN, LSTM, C3D, and I3D on the lip region were 67.12%, 75.53%, 86.32%, and 93.24%, respectively. The 3DCNN-based models achieved higher accuracy due to their ability to process spatio-temporal data. Furthermore, an additional 1.23% improvement was achieved through ensemble learning, with the best result reaching 94.53% accuracy. Ensemble learning, by combining the strengths of different models, provided a meaningful improvement in overall performance. These results demonstrate that 3DCNN architectures and ensemble learning methods yield high success in addressing the problem of lip reading in the Turkish language. While our study focuses on Turkish digit recognition, the proposed methods have the potential to be successful in other languages or broader lip reading applications.
引用
收藏
页数:23
相关论文
共 50 条
  • [21] ResMorCNN Model: Hyperspectral Images Classification Using Residual-Injection Morphological Features and 3DCNN Layers
    Esmaeili, Mohammad
    Abbasi-Moghadam, Dariush
    Sharifi, Alireza
    Tariq, Aqil
    Li, Qingting
    IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, 2024, 17 : 219 - 243
  • [22] Lip Reading Using Various Deep Learning Models with Visual Turkish Data
    Tumer Sivri, Talya
    Berkol, Ali
    Erdem, Hamit
    GAZI UNIVERSITY JOURNAL OF SCIENCE, 2024, 37 (03): : 1190 - 1203
  • [23] Sentences Prediction Based on Automatic Lip-Reading Detection with Deep Learning Convolutional Neural Networks Using Video-Based Features
    Mahboob, Khalid
    Nizami, Hafsa
    Ali, Fayyaz
    Alvi, Farrukh
    SOFT COMPUTING IN DATA SCIENCE, SCDS 2021, 2021, 1489 : 42 - 53
  • [24] Lip-reading via Deep Neural Network Using Appearance-based Visual Features
    Vakhshiteh, Fatemeh
    Almasganj, Farshad
    2017 24TH NATIONAL AND 2ND INTERNATIONAL IRANIAN CONFERENCE ON BIOMEDICAL ENGINEERING (ICBME), 2017, : 147 - 152
  • [25] Preliminary Study of Mobile Device-Based Speech Enhancement System Using Lip-Reading
    Matsunaga, Yuta
    Matsui, Kenji
    Nakatoh, Yoshihisa
    Kato, Yumiko O.
    Lopez-Sanchez, Daniel
    Rodriguez, Sara
    Manuel Corchado, Juan
    DISTRIBUTED COMPUTING AND ARTIFICIAL INTELLIGENCE, 2019, 800 : 308 - 315
  • [26] Preliminary study of mobile device-based speech enhancement system using lip-reading
    Matsunaga, Yuta
    Matsui, Kenji
    Nakatoh, Yoshihisa
    Kato, Yumiko O.
    Lopez-Sanchez, Daniel
    Rodriguez, Sara
    Corchado, Juan Manuel
    Advances in Intelligent Systems and Computing, 2019, 800 : 308 - 315
  • [27] A Deep Learning Approach based on Two-channel 3DCNN Networks to Predict the Risk of Femoral Fracture using CT Images
    Yuan, Kaiyang
    Wang, Ling
    Wei, Qiuyue
    Cheng, Xiaoguang
    Zhao, Chen
    Tang, Shaojie
    Deng, Hong-Wen
    Zhou, Weihua
    JOURNAL OF IMAGING SCIENCE AND TECHNOLOGY, 2024, 68 (06)
  • [28] Multi-level classification of Alzheimer disease using DCNN and ensemble deep learning techniques
    Khanna, M. Rajesh
    SIGNAL IMAGE AND VIDEO PROCESSING, 2023, 17 (07) : 3603 - 3611
  • [29] Multi-level classification of Alzheimer disease using DCNN and ensemble deep learning techniques
    M. Rajesh Khanna
    Signal, Image and Video Processing, 2023, 17 : 3603 - 3611
  • [30] Isolated single sound lip-reading using a frame-based camera and event-based camera
    Kanamaru, Tatsuya
    Arakane, Taiki
    Saitoh, Takeshi
    FRONTIERS IN ARTIFICIAL INTELLIGENCE, 2023, 5