Effect of Various Visual Speech Units on Language Identification Using Visual Speech Recognition

被引：4

作者：

Brahme, Aparna ^{[1
]}

Bhadade, Umesh ^{[2
]}

机构：

[1] METs Inst Engn, Adgaon 422003, Nashik, India

[2] SSBTs Coll Engn & Technol, Jalgaon, Maharashtra, India

来源：

INTERNATIONAL JOURNAL OF IMAGE AND GRAPHICS | 2020年 / 20卷 / 04期

关键词：

Visual speech recognition; Visem; language identification; Marathi; ACTIVE SHAPE MODELS; ALGORITHM;

D O I：

10.1142/S0219467820500291

中图分类号：

TP31 [计算机软件];

学科分类号：

081202 ; 0835 ;

摘要：

In this paper, we describe our work in Spoken language Identification using Visual Speech Recognition (VSR) and analyze the effect of various visual speech units used to transcribe the visual speech on language recognition. We have proposed a new approach of word recognition followed by the word N-gram language model (WRWLM), which uses high-level syntactic features and the word bigram language model for language discrimination. Also, as opposed to the traditional visemic approach, we propose a holistic approach of using the signature of a whole word, referred to as a "Visual Word" as visual speech unit for transcribing visual speech. The result shows Word Recognition Rate (WRR) of 88% and Language Recognition Rate (LRR) of 94% in speaker dependent cases and 58% WRR and 77% LRR in speaker independent cases for English and Marathi digit classification task. The proposed approach is also evaluated for continuous speech input. The result shows that the Spoken Language Identification rate of 50% is possible even though the WRR using Visual Speech Recognition is below 10%, using only 1 s of speech. Also, there is an improvement of about 5% in language discrimination as compared to traditional visemic approaches.

引用

页数：27

共 50 条

[41] Speaker-Independent Speech Recognition using Visual Features
Pooventhiran, G.
Sandeep, A.
Manthiravalli, K.
Harish, D.
Renuka, Karthika D.
[J]. INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2020, 11 (11) : 616 - 620
[42] Audio-visual speech recognition using deep learning
Kuniaki Noda
Yuki Yamaguchi
Kazuhiro Nakadai
Hiroshi G. Okuno
Tetsuya Ogata
[J]. Applied Intelligence, 2015, 42 : 722 - 737
[43] Audio-visual speech recognition using an infrared headset
Huang, J
Potamianos, G
Connell, J
Neti, C
[J]. SPEECH COMMUNICATION, 2004, 44 (1-4) : 83 - 96
[44] Visual speech recognition using compact hypercomplex neural networks
Panagos, Iason Ioannis
Sfikas, Giorgos
Nikou, Christophoros
[J]. PATTERN RECOGNITION LETTERS, 2024, 186 : 1 - 7
[45] Data Collection for Mobile Audio-visual Speech Recognition in Various Environments
Tamura, Satoshi
Seko, Takumi
Hayamizu, Satoru
[J]. 2014 17TH ORIENTAL CHAPTER OF THE INTERNATIONAL COMMITTEE FOR THE CO-ORDINATION AND STANDARDIZATION OF SPEECH DATABASES AND ASSESSMENT TECHNIQUES (COCOSDA), 2014,
[46] Enhancing Audio Speech using Visual Speech Features
Almajai, Ibrahim
Milner, Ben
[J]. INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 1915 - 1918
[47] Specialist speech and language therapists' use and evaluation of visual speech aids
Coventry, KR
Clibbens, J
Cooper, M
[J]. EUROPEAN JOURNAL OF DISORDERS OF COMMUNICATION, 1997, 32 (03): : 315 - 323
[48] SSSD: Speech Scene database by Smart Device for Visual Speech Recognition
Saitoh, Takeshi
Kubokawa, Michiko
[J]. 2018 24TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2018, : 3228 - 3232
[49] Temporal Filtering of Visual Speech for Audio-Visual Speech Recognition in Acoustically and Visually Challenging Environments
Lee, Jong-Seok
Park, Cheol Hoon
[J]. ICMI'07: PROCEEDINGS OF THE NINTH INTERNATIONAL CONFERENCE ON MULTIMODAL INTERFACES, 2007, : 220 - 227
[50] A Survey on Different Visual Speech Recognition Techniques
Bhaskar, Shabina
Thasleema, T. M.
Rajesh, R.
[J]. DATA ANALYTICS AND LEARNING, 2019, 43 : 307 - 316

← 1 2 3 4 5 →