Effect of Various Visual Speech Units on Language Identification Using Visual Speech Recognition

被引:4
|
作者
Brahme, Aparna [1 ]
Bhadade, Umesh [2 ]
机构
[1] METs Inst Engn, Adgaon 422003, Nashik, India
[2] SSBTs Coll Engn & Technol, Jalgaon, Maharashtra, India
关键词
Visual speech recognition; Visem; language identification; Marathi; ACTIVE SHAPE MODELS; ALGORITHM;
D O I
10.1142/S0219467820500291
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
In this paper, we describe our work in Spoken language Identification using Visual Speech Recognition (VSR) and analyze the effect of various visual speech units used to transcribe the visual speech on language recognition. We have proposed a new approach of word recognition followed by the word N-gram language model (WRWLM), which uses high-level syntactic features and the word bigram language model for language discrimination. Also, as opposed to the traditional visemic approach, we propose a holistic approach of using the signature of a whole word, referred to as a "Visual Word" as visual speech unit for transcribing visual speech. The result shows Word Recognition Rate (WRR) of 88% and Language Recognition Rate (LRR) of 94% in speaker dependent cases and 58% WRR and 77% LRR in speaker independent cases for English and Marathi digit classification task. The proposed approach is also evaluated for continuous speech input. The result shows that the Spoken Language Identification rate of 50% is possible even though the WRR using Visual Speech Recognition is below 10%, using only 1 s of speech. Also, there is an improvement of about 5% in language discrimination as compared to traditional visemic approaches.
引用
收藏
页数:27
相关论文
共 50 条
  • [1] A robust speech disorders correction system for Arabic language using visual speech recognition
    Farag, Ahmed
    El Adawy, Mohamed
    Ismail, Ahmed
    [J]. BIOMEDICAL RESEARCH-INDIA, 2013, 24 (02): : 185 - 192
  • [2] A Novel Visual Speech Representation and HMM Classification for Visual Speech Recognition
    Yu, Dahai
    Ghita, Ovidiu
    Sutherland, Alistair
    Whelan, Paul F.
    [J]. ADVANCES IN IMAGE AND VIDEO TECHNOLOGY, PROCEEDINGS, 2009, 5414 : 398 - 409
  • [3] Using the visual component in automatic speech recognition
    Brooke, NM
    [J]. ICSLP 96 - FOURTH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, VOLS 1-4, 1996, : 1656 - 1659
  • [4] Unified System for Visual Speech Recognition and Speaker Identification
    Rekik, Ahmed
    Ben-Hamadou, Achraf
    Mahdi, Walid
    [J]. ADVANCED CONCEPTS FOR INTELLIGENT VISION SYSTEMS, ACIVS 2015, 2015, 9386 : 381 - 390
  • [5] CONTINUOUS VISUAL SPEECH RECOGNITION FOR AUDIO SPEECH ENHANCEMENT
    Benhaim, Eric
    Sahbi, Hichem
    Vitte, Guillaume
    [J]. 2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 2244 - 2248
  • [6] Visual speech feature extraction for improved speech recognition
    Zhang, X
    Mersereau, RM
    Clements, M
    Broun, CC
    [J]. 2002 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I-IV, PROCEEDINGS, 2002, : 1993 - 1996
  • [7] The effect of prior visual information on recognition of speech and sounds
    Noppeney, Uta
    Josephs, Oliver
    Hocking, Julia
    Price, Cathy J.
    Friston, Karl J.
    [J]. CEREBRAL CORTEX, 2008, 18 (03) : 598 - 609
  • [8] Language identification from visual-only speech signals
    Ronquest, Rebecca E.
    Levi, Susannah V.
    Pisoni, David B.
    [J]. ATTENTION PERCEPTION & PSYCHOPHYSICS, 2010, 72 (06) : 1601 - 1613
  • [9] Language identification from visual-only speech signals
    Rebecca E. Ronquest
    Susannah V. Levi
    David B. Pisoni
    [J]. Attention, Perception, & Psychophysics, 2010, 72 : 1601 - 1613
  • [10] Method of speech recognition and speaker identification using audio-visual of polish speech and hidden Markov models
    Kubanek, Mariusz
    [J]. BIOMETRICS, COMPUTER SECURITY SYSTEMS AND ARTIFICIAL INTELLIGENCE APPLICATIONS, 2006, : 45 - 55