Effect of Various Visual Speech Units on Language Identification Using Visual Speech Recognition

被引:4
|
作者
Brahme, Aparna [1 ]
Bhadade, Umesh [2 ]
机构
[1] METs Inst Engn, Adgaon 422003, Nashik, India
[2] SSBTs Coll Engn & Technol, Jalgaon, Maharashtra, India
关键词
Visual speech recognition; Visem; language identification; Marathi; ACTIVE SHAPE MODELS; ALGORITHM;
D O I
10.1142/S0219467820500291
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
In this paper, we describe our work in Spoken language Identification using Visual Speech Recognition (VSR) and analyze the effect of various visual speech units used to transcribe the visual speech on language recognition. We have proposed a new approach of word recognition followed by the word N-gram language model (WRWLM), which uses high-level syntactic features and the word bigram language model for language discrimination. Also, as opposed to the traditional visemic approach, we propose a holistic approach of using the signature of a whole word, referred to as a "Visual Word" as visual speech unit for transcribing visual speech. The result shows Word Recognition Rate (WRR) of 88% and Language Recognition Rate (LRR) of 94% in speaker dependent cases and 58% WRR and 77% LRR in speaker independent cases for English and Marathi digit classification task. The proposed approach is also evaluated for continuous speech input. The result shows that the Spoken Language Identification rate of 50% is possible even though the WRR using Visual Speech Recognition is below 10%, using only 1 s of speech. Also, there is an improvement of about 5% in language discrimination as compared to traditional visemic approaches.
引用
收藏
页数:27
相关论文
共 50 条
  • [21] Speech-to-Visual Speech Synthesis Using Chinese Visual Triphone
    Zhao, Hui
    Shen, Yamin
    Tang, Chaojing
    [J]. 2ND IEEE INTERNATIONAL CONFERENCE ON ADVANCED COMPUTER CONTROL (ICACC 2010), VOL. 2, 2010, : 241 - 245
  • [22] TELEVISED VISUAL CONTRIBUTION TO SPEECH RECOGNITION
    BROADBENT, D
    [J]. IEEE TRANSACTIONS ON EDUCATION, 1970, E 13 (02) : 79 - +
  • [23] RESOLUTION LIMITS ON VISUAL SPEECH RECOGNITION
    Bear, Helen L.
    Harvey, Richard
    Theobald, Barry-John
    Lan, Yuxuan
    [J]. 2014 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2014, : 1371 - 1375
  • [24] Visual speech information for face recognition
    Lawrence D. Rosenblum
    Deborah A. Yakel
    Naser Baseer
    Anjani Panchal
    Brynn C. Nodarse
    Ryan P. Niehus
    [J]. Perception & Psychophysics, 2002, 64 : 220 - 229
  • [25] Visual Hallucination Elevates Speech Recognition
    Zhang, Fang
    Zhu, Yongxin
    Wang, Xiangxiang
    Chen, Huang
    Sun, Xing
    Xu, Linli
    [J]. THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 17, 2024, : 19542 - 19550
  • [26] Boundary Descriptors for Visual Speech Recognition
    Gupta, Deepika
    Singh, Preety
    Laxmi, V.
    Gaur, Manoj S.
    [J]. COMPUTER AND INFORMATION SCIENCES II, 2012, : 307 - 313
  • [27] Visual speech information for face recognition
    Rosenblum, LD
    Yakel, DA
    Baseer, N
    Panchal, A
    Nodarse, BC
    Niehus, RP
    [J]. PERCEPTION & PSYCHOPHYSICS, 2002, 64 (02): : 220 - 229
  • [28] Audio-visual speech recognition in a Portuguese language based application
    Pera, V
    Sá, F
    Afonso, P
    Ferreira, R
    [J]. 2003 IEEE INTERNATIONAL CONFERENCE ON INDUSTRIAL TECHNOLOGY, VOLS 1 AND 2, PROCEEDINGS, 2003, : 688 - 692
  • [29] An evaluation of visual speech features for the tasks of speech and speaker recognition
    Lucey, S
    [J]. AUDIO-BASED AND VIDEO-BASED BIOMETRIC PERSON AUTHENTICATION, PROCEEDINGS, 2003, 2688 : 260 - 267
  • [30] Audio-Visual Speech Modeling for Continuous Speech Recognition
    Dupont, Stephane
    Luettin, Juergen
    [J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2000, 2 (03) : 141 - 151