Effect of Various Visual Speech Units on Language Identification Using Visual Speech Recognition

被引:4
|
作者
Brahme, Aparna [1 ]
Bhadade, Umesh [2 ]
机构
[1] METs Inst Engn, Adgaon 422003, Nashik, India
[2] SSBTs Coll Engn & Technol, Jalgaon, Maharashtra, India
关键词
Visual speech recognition; Visem; language identification; Marathi; ACTIVE SHAPE MODELS; ALGORITHM;
D O I
10.1142/S0219467820500291
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
In this paper, we describe our work in Spoken language Identification using Visual Speech Recognition (VSR) and analyze the effect of various visual speech units used to transcribe the visual speech on language recognition. We have proposed a new approach of word recognition followed by the word N-gram language model (WRWLM), which uses high-level syntactic features and the word bigram language model for language discrimination. Also, as opposed to the traditional visemic approach, we propose a holistic approach of using the signature of a whole word, referred to as a "Visual Word" as visual speech unit for transcribing visual speech. The result shows Word Recognition Rate (WRR) of 88% and Language Recognition Rate (LRR) of 94% in speaker dependent cases and 58% WRR and 77% LRR in speaker independent cases for English and Marathi digit classification task. The proposed approach is also evaluated for continuous speech input. The result shows that the Spoken Language Identification rate of 50% is possible even though the WRR using Visual Speech Recognition is below 10%, using only 1 s of speech. Also, there is an improvement of about 5% in language discrimination as compared to traditional visemic approaches.
引用
收藏
页数:27
相关论文
共 50 条
  • [41] Speaker-Independent Speech Recognition using Visual Features
    Pooventhiran, G.
    Sandeep, A.
    Manthiravalli, K.
    Harish, D.
    Renuka, Karthika D.
    [J]. INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2020, 11 (11) : 616 - 620
  • [42] Audio-visual speech recognition using deep learning
    Kuniaki Noda
    Yuki Yamaguchi
    Kazuhiro Nakadai
    Hiroshi G. Okuno
    Tetsuya Ogata
    [J]. Applied Intelligence, 2015, 42 : 722 - 737
  • [43] Audio-visual speech recognition using an infrared headset
    Huang, J
    Potamianos, G
    Connell, J
    Neti, C
    [J]. SPEECH COMMUNICATION, 2004, 44 (1-4) : 83 - 96
  • [44] Visual speech recognition using compact hypercomplex neural networks
    Panagos, Iason Ioannis
    Sfikas, Giorgos
    Nikou, Christophoros
    [J]. PATTERN RECOGNITION LETTERS, 2024, 186 : 1 - 7
  • [45] Data Collection for Mobile Audio-visual Speech Recognition in Various Environments
    Tamura, Satoshi
    Seko, Takumi
    Hayamizu, Satoru
    [J]. 2014 17TH ORIENTAL CHAPTER OF THE INTERNATIONAL COMMITTEE FOR THE CO-ORDINATION AND STANDARDIZATION OF SPEECH DATABASES AND ASSESSMENT TECHNIQUES (COCOSDA), 2014,
  • [46] Enhancing Audio Speech using Visual Speech Features
    Almajai, Ibrahim
    Milner, Ben
    [J]. INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 1915 - 1918
  • [47] Specialist speech and language therapists' use and evaluation of visual speech aids
    Coventry, KR
    Clibbens, J
    Cooper, M
    [J]. EUROPEAN JOURNAL OF DISORDERS OF COMMUNICATION, 1997, 32 (03): : 315 - 323
  • [48] SSSD: Speech Scene database by Smart Device for Visual Speech Recognition
    Saitoh, Takeshi
    Kubokawa, Michiko
    [J]. 2018 24TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2018, : 3228 - 3232
  • [49] Temporal Filtering of Visual Speech for Audio-Visual Speech Recognition in Acoustically and Visually Challenging Environments
    Lee, Jong-Seok
    Park, Cheol Hoon
    [J]. ICMI'07: PROCEEDINGS OF THE NINTH INTERNATIONAL CONFERENCE ON MULTIMODAL INTERFACES, 2007, : 220 - 227
  • [50] A Survey on Different Visual Speech Recognition Techniques
    Bhaskar, Shabina
    Thasleema, T. M.
    Rajesh, R.
    [J]. DATA ANALYTICS AND LEARNING, 2019, 43 : 307 - 316