Lip-reading via a DNN-HMM Hybrid System Using Combination of The Image-based and Model-based Features

被引:0
|
作者
Rahmani, Mohammad Hasan [1 ]
Almasganj, Farshad [1 ]
机构
[1] Amirkabir Univ Technol, Tehran Polytech, Biomed Engn Dept, Tehran, Iran
关键词
lip-reading; feature extraction; deep auto-encoder; DBNF; NEURAL-NETWORKS;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Introducing features that better represent the visual information of speakers during the speech production is still an open issue that highly affects the quality of the lip-reading and Audio Visual Speech Recognition (AVSR) tasks. In this paper, three different types of visual features from both the image-based and model-based ones are investigated inside a professional lip reading task. The simple raw gray level information of the lips Region of Interest (ROI), the geometric representation of lips shape and the Deep Bottle-neck Features (DBNFs) extracted from a 6-layer Deep Auto-encoder Neural Network (DANN) are three valuable feature sets compared while employed for the lip reading purpose. Two different recognition systems, including the conventional GMM-HMM and the state-of-the-art DNN-HMM hybrid, are utilized to perform an isolated and connected digit recognition task. The results indicate that the high level information extracted from deep layers of the lips ROI can represent the visual modality with advantage of "high amount of information in a low dimension feature vector". Moreover, the DBNFs showed a relative improvement with an average of 15.4% in comparison to the shape features and the shape features showed a relative improvement with an average of 20.4% in comparison to the ROI features over the test data.
引用
收藏
页码:195 / 199
页数:5
相关论文
共 50 条
  • [1] Recognizing the content types of network traffic based on a hybrid DNN-HMM model
    Tan, Xincheng
    Xie, Yi
    Ma, Haishou
    Yu, Shunzheng
    Hu, Jiankun
    [J]. JOURNAL OF NETWORK AND COMPUTER APPLICATIONS, 2019, 142 : 51 - 62
  • [2] LIP-READING VIA DEEP NEURAL NETWORKS USING HYBRID VISUAL FEATURES
    Vakhshiteh, Fatemeh
    Almasganj, Farshad
    Nickabadi, Ahmad
    [J]. IMAGE ANALYSIS & STEREOLOGY, 2018, 37 (02): : 159 - 171
  • [3] Colour and geometric based model for lip localisation: Application for lip-reading system
    Werda, Salah
    Mahdi, Walid
    Ben Hamadou, Abdelmajid
    [J]. 14TH INTERNATIONAL CONFERENCE ON IMAGE ANALYSIS AND PROCESSING, PROCEEDINGS, 2007, : 9 - +
  • [4] Lip-reading via Deep Neural Network Using Appearance-based Visual Features
    Vakhshiteh, Fatemeh
    Almasganj, Farshad
    [J]. 2017 24TH NATIONAL AND 2ND INTERNATIONAL IRANIAN CONFERENCE ON BIOMEDICAL ENGINEERING (ICBME), 2017, : 147 - 152
  • [5] Combining hybrid DNN-HMM ASR systems with attention-based models using lattice rescoring
    Li, Qiujia
    Zhang, Chao
    Woodland, Philip C.
    [J]. SPEECH COMMUNICATION, 2023, 147 : 12 - 21
  • [6] Hybrid Deep Neural Network - Hidden Markov Model (DNN-HMM) Based Speech Emotion Recognition
    Li, Longfei
    Zhao, Yong
    Jiang, Dongmei
    Zhang, Yanning
    Wang, Fengna
    Gonzalez, Isabel
    Valentin, Enescu
    Sahli, Hichem
    [J]. 2013 HUMAINE ASSOCIATION CONFERENCE ON AFFECTIVE COMPUTING AND INTELLIGENT INTERACTION (ACII), 2013, : 312 - 317
  • [7] English to Japanese Spoken Lecture Translation System by Using DNN-HMM and Phrase-based SMT
    Goto, Norioki
    Yamamoto, Kazumasa
    Nakagawa, Seiichi
    [J]. 2015 2ND INTERNATIONAL CONFERENCE ON ADVANCED INFORMATICS: CONCEPTS, THEORY AND APPLICATIONS ICAICTA, 2015,
  • [8] Mobile Device-based Speech Enhancement System Using Lip-reading
    Matsunaga, Yuta
    Matsui, Kenji
    [J]. 2018 IEEE INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE IN ENGINEERING AND TECHNOLOGY (IICAIET), 2018, : 13 - 16
  • [9] Mobile device-based speech enhancement system using lip-reading
    Nakahara, Tomonori
    Fukuyama, Kohei
    Hamada, Mitsuru
    Matsui, Kenji
    Nakatoh, Yoshihisa
    Kato, Yumiko O.
    Rivas, Alberto
    Corchado, Juan Manuel
    [J]. Advances in Intelligent Systems and Computing, 2021, 1237 AISC : 159 - 167
  • [10] Recognizing the content types of network traffic based on a hybrid DNN-HMM model (vol 142, pg 51, 2019)
    Tan, Xincheng
    Xie, Yi
    Ma, Haishou
    Yu, Shunzheng
    Hu, Jiankun
    [J]. JOURNAL OF NETWORK AND COMPUTER APPLICATIONS, 2019, 145