Visual speech recognition for small scale dataset using VGG16 convolution neural network

被引:0
|
作者
Shashidhar R
Sudarshan Patilkulkarni
机构
[1] JSS Science and Technology University,Department of Electronics and Communication Engineering
来源
关键词
Visual speech recognition; Lip-reading; Convolutional neural network; VGG16;
D O I
暂无
中图分类号
学科分类号
摘要
Visual speech recognition is a method that comprehends speech from speakers lip movements and the speech is validated only by the shape and lip movement. Implementation of this practice not only helps people with hearing impaired but also can be used for professional lip reading whose application can be seen in crime and forensics. It plays a crucial role in aforementioned domains, as normal person’s speech will be converted to text. Here, it is proposed to enhance the visual speech recognition technique from the video. The dataset was created and the same was used for implementation and verification. The aim of the approach was to recognize words only from the lip movement using video in the absence of audio and this mostly helps to extract words from a video without audio that helps in forensic and crime analysis. The proposed method employs VGG16 pre trained Convolutional Neural Network architecture for classification and recognition of data. It was observed that the visual modality improves the performance of speech recognition system. Finally, the obtained results were compared with the Hahn Convolutional Neural Network architecture (HCNN). The accuracy of the recommended model is 76% in visual speech recognition.
引用
收藏
页码:28941 / 28952
页数:11
相关论文
共 50 条
  • [41] Classifying Normal and Crackle Lung Sounds Using VGG16 Convolutional Layers and Neuro-Fuzzy Network
    Kim, Minwoo
    Bae, Jinhee
    Lim, Joon S.
    [J]. BASIC & CLINICAL PHARMACOLOGY & TOXICOLOGY, 2020, 126 : 6 - 7
  • [42] CLASSIFICATION RECOGNITION OF LONG DISTANCE OIL AND GAS PIPELINE GIRTH WELDS MFL SIGNAL DIAGRAMS BASED ON VGG16 NETWORK
    Geng, Liyuan
    Dong, Shaohua
    Zheng, Li
    Li, Shengwei
    [J]. PROCEEDINGS OF ASME 2022 PRESSURE VESSELS AND PIPING CONFERENCE, PVP2022, VOL 5, 2022,
  • [43] COVID-19 Detection Model on Chest CT Scan and X-ray Images Using VGG16 Convolutional Neural Network
    Latisha, Shannen
    Halim, Albert Christopher
    Ricardo, Regan
    Suhartono, Derwin
    [J]. 2021 4TH INTERNATIONAL SEMINAR ON RESEARCH OF INFORMATION TECHNOLOGY AND INTELLIGENT SYSTEMS (ISRITI 2021), 2020,
  • [44] A novel approach using deep convolutional neural network to classify the photographs based on leading-line by fine-tuning the pre-trained VGG16 neural network
    Debnath, Soma
    Roy, Ratnakirti
    Changder, Suvamoy
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2022, 83 (1) : 3189 - 3214
  • [45] Transfer learning with VGG16 deep convolutional neural network model effectively differentiates between subtypes of bright and dark lesions
    Kay, Anna
    Nguyen, Mickey
    [J]. INVESTIGATIVE OPHTHALMOLOGY & VISUAL SCIENCE, 2023, 64 (08)
  • [46] Speech Recognition using Artificial Neural Network
    Gupta, Arpita
    Joshi, Akshay
    [J]. PROCEEDINGS OF THE 2018 IEEE INTERNATIONAL CONFERENCE ON COMMUNICATION AND SIGNAL PROCESSING (ICCSP), 2018, : 68 - 71
  • [47] Speech perception based on mapping speech to image by using convolution neural network
    Trung, Quang Nguyen
    Duy, The Bui
    [J]. PROCEEDINGS OF 2018 5TH NAFOSTED CONFERENCE ON INFORMATION AND COMPUTER SCIENCE (NICS 2018), 2018, : 255 - 259
  • [48] Speech Emotion Recognition Based on Convolution Neural Network combined with Random Forest
    Zheng, Li
    Li, Qiao
    Ban, Hua
    Liu, Shuhua
    [J]. PROCEEDINGS OF THE 30TH CHINESE CONTROL AND DECISION CONFERENCE (2018 CCDC), 2018, : 4143 - 4147
  • [49] Audio-Visual (Multimodal) Speech Recognition System Using Deep Neural Network
    Paulin, Hebsibah
    Milton, R. S.
    JanakiRaman, S.
    Chandraprabha, K.
    [J]. JOURNAL OF TESTING AND EVALUATION, 2019, 47 (06) : 3963 - 3974
  • [50] Combining audio and visual speech recognition using LSTM and deep convolutional neural network
    Shashidhar R.
    Patilkulkarni S.
    Puneeth S.B.
    [J]. International Journal of Information Technology, 2022, 14 (7) : 3425 - 3436