Lip-Reading Classification of Turkish Digits Using Ensemble Learning Architecture Based on 3DCNN

被引:0
|
作者
Erbey, Ali [1 ,2 ]
Barisci, Necaattin [3 ]
机构
[1] Usak Univ, Distance Educ Vocat Sch, Dept Comp Programming, TR-64200 Usak, Turkiye
[2] Gazi Univ, Informat Inst, Informat Syst, TR-06560 Ankara, Turkiye
[3] Gazi Univ, Fac Technol, Dept Comp Engn, TR-06560 Ankara, Turkiye
来源
APPLIED SCIENCES-BASEL | 2025年 / 15卷 / 02期
关键词
lip-reading; ensemble learning; 3DCNN; RECOGNITION;
D O I
10.3390/app15020563
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
Understanding others correctly is of great importance for maintaining effective communication. Factors such as hearing difficulties or environmental noise can disrupt this process. Lip reading offers an effective solution to these challenges. With the growing success of deep learning architectures, research on lip reading has gained momentum. The aim of this study is to create a lip reading dataset for Turkish digit recognition and to conduct predictive analyses. The dataset has divided into two subsets: the face region and the lip region. CNN, LSTM, and 3DCNN-based models, including C3D, I3D, and 3DCNN+BiLSTM, were used. While LSTM models are effective in processing temporal data, 3DCNN-based models, which can process both spatial and temporal information, achieved higher accuracy in this study. Experimental results showed that the dataset containing only the lip region performed better; accuracy rates for CNN, LSTM, C3D, and I3D on the lip region were 67.12%, 75.53%, 86.32%, and 93.24%, respectively. The 3DCNN-based models achieved higher accuracy due to their ability to process spatio-temporal data. Furthermore, an additional 1.23% improvement was achieved through ensemble learning, with the best result reaching 94.53% accuracy. Ensemble learning, by combining the strengths of different models, provided a meaningful improvement in overall performance. These results demonstrate that 3DCNN architectures and ensemble learning methods yield high success in addressing the problem of lip reading in the Turkish language. While our study focuses on Turkish digit recognition, the proposed methods have the potential to be successful in other languages or broader lip reading applications.
引用
收藏
页数:23
相关论文
共 50 条
  • [31] Deep learning based hemorrhages classification using dcnn with optimized LSTM
    Veena, A.
    Gowrishankar, S.
    MULTIMEDIA TOOLS AND APPLICATIONS, 2024, 83 (32) : 77595 - 77616
  • [32] Indonesian Lip-Reading Detection and Recognition Based on Lip Shape Using Face Mesh and Long-Term Recurrent Convolutional Network
    Aripin
    Setiawan, Abas
    APPLIED COMPUTATIONAL INTELLIGENCE AND SOFT COMPUTING, 2024, 2024
  • [33] Mini-3DCvT: a lightweight lip-reading method based on 3D convolution visual transformer
    Wang, Huijuan
    Cui, Boyan
    Yuan, Quanbo
    Pu, Gangqiang
    Liu, Xueli
    Zhu, Jie
    VISUAL COMPUTER, 2025, 41 (03): : 1957 - 1969
  • [34] 3DCNN: THREE-LAYERS DEEP CONVOLUTIONAL NEURAL NETWORK ARCHITECTURE FOR BREAST CANCER DETECTION USING CLINICAL IMAGE DATA
    Ul Haq, Amin
    Li, Jian Ping
    Saboor, Abdus
    Khan, Jalaluddin
    Zhou, Wang
    Jiang, Tao
    Rao, Mordecai F.
    Wali, Samad
    2020 17TH INTERNATIONAL COMPUTER CONFERENCE ON WAVELET ACTIVE MEDIA TECHNOLOGY AND INFORMATION PROCESSING (ICCWAMTIP), 2020, : 83 - 88
  • [35] P3CMQA: Single-Model Quality Assessment Using 3DCNN with Profile-Based Features
    Takei, Yuma
    Ishida, Takashi
    BIOENGINEERING-BASEL, 2021, 8 (03):
  • [36] Geometrical-based lip-reading using template probabilistic multi-dimension dynamic time warping
    Ibrahim, M. Z.
    Mulvaney, D. J.
    JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2015, 30 : 219 - 233
  • [37] Lip reading using wavelet-based features and Random Forests classification
    Terissi, Lucas D.
    Parodi, Marianela
    Gomez, Juan C.
    2014 22ND INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2014, : 791 - 796
  • [38] Lip-reading via a DNN-HMM Hybrid System Using Combination of The Image-based and Model-based Features
    Rahmani, Mohammad Hasan
    Almasganj, Farshad
    2017 3RD INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION AND IMAGE ANALYSIS (IPRIA), 2017, : 195 - 199
  • [39] An orbicularis oris, buccinator, zygomaticus, and risorius muscle contraction classification for lip-reading during speech using sEMG signals on multi-channels
    Deny, J.
    Raja Sudharsan, R.
    Muthu Kumaran, E.
    INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY, 2021, 24 (03) : 593 - 600
  • [40] An orbicularis oris, buccinator, zygomaticus, and risorius muscle contraction classification for lip-reading during speech using sEMG signals on multi-channels
    J. Deny
    R. Raja Sudharsan
    E. Muthu Kumaran
    International Journal of Speech Technology, 2021, 24 : 593 - 600