Audio-visual speech recognition techniques in augmented reality environments

被引：15

作者：

Mirzaei, Mohammad Reza ^{[1
]}

Ghorshi, Seyed ^{[1
]}

Mortazavi, Mohammad ^{[1
]}

机构：

[1] Sharif Univ Technol, Sch Sci & Engn, Kish Isl, Iran

来源：

VISUAL COMPUTER | 2014年 / 30卷 / 03期

关键词：

Augmented reality; Audio-visual speech recognition; Augmented reality environments; Communication; Deaf people; VIRTUAL-REALITY;

D O I：

10.1007/s00371-013-0841-1

中图分类号：

TP31 [计算机软件];

学科分类号：

081202 ; 0835 ;

摘要：

Many recent studies show that Augmented Reality (AR) and Automatic Speech Recognition (ASR) technologies can be used to help people with disabilities. Many of these studies have been performed only in their specialized field. Audio-Visual Speech Recognition (AVSR) is one of the advances in ASR technology that combines audio, video, and facial expressions to capture a narrator's voice. In this paper, we combine AR and AVSR technologies to make a new system to help deaf and hard-of-hearing people. Our proposed system can take a narrator's speech instantly and convert it into a readable text and show the text directly on an AR display. Therefore, in this system, deaf people can read the narrator's speech easily. In addition, people do not need to learn sign-language to communicate with deaf people. The evaluation results show that this system has lower word error rate compared to ASR and VSR in different noisy conditions. Furthermore, the results of using AVSR techniques show that the recognition accuracy of the system has been improved in noisy places. Also, the results of a survey that was conducted with 100 deaf people show that more than 80 % of deaf people are very interested in using our system as an assistant in portable devices to communicate with people.

引用

页码：245 / 257

页数：13

共 50 条

[1] Audio-visual speech recognition techniques in augmented reality environments
Mohammad Reza Mirzaei
Seyed Ghorshi
Mohammad Mortazavi
[J]. The Visual Computer, 2014, 30 : 245 - 257
[2] Audio-Visual Speech Recognition in Noisy Audio Environments
Palecek, Karel
Chaloupka, Josef
[J]. 2013 36TH INTERNATIONAL CONFERENCE ON TELECOMMUNICATIONS AND SIGNAL PROCESSING (TSP), 2013, : 484 - 487
[3] Information Fusion Techniques in Audio-Visual Speech Recognition
Karabalkan, H.
Erdogan, H.
[J]. 2009 IEEE 17TH SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE, VOLS 1 AND 2, 2009, : 734 - 737
[4] Audio-visual speech recognition based on joint training with audio-visual speech enhancement for robust speech recognition
Hwang, Jung-Wook
Park, Jeongkyun
Park, Rae-Hong
Park, Hyung-Min
[J]. APPLIED ACOUSTICS, 2023, 211
[5] A Comparison of Model Validation Techniques for Audio-Visual Speech Recognition
Seong, Thum Wei
Ibrahim, Mohd Zamri
Arshad, Nurul Wahidah Binti
Mulvaney, D. J.
[J]. IT CONVERGENCE AND SECURITY 2017, VOL 1, 2018, 449 : 112 - 119
[6] An audio-visual speech recognition with a new mandarin audio-visual database
Liao, Wen-Yuan
Pao, Tsang-Long
Chen, Yu-Te
Chang, Tsun-Wei
[J]. INT CONF ON CYBERNETICS AND INFORMATION TECHNOLOGIES, SYSTEMS AND APPLICATIONS/INT CONF ON COMPUTING, COMMUNICATIONS AND CONTROL TECHNOLOGIES, VOL 1, 2007, : 19 - +
[7] Deep Audio-Visual Speech Recognition
Afouras, Triantafyllos
Chung, Joon Son
Senior, Andrew
Vinyals, Oriol
Zisserman, Andrew
[J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2022, 44 (12) : 8717 - 8727
[8] MULTIPOSE AUDIO-VISUAL SPEECH RECOGNITION
Estellers, Virginia
Thiran, Jean-Philippe
[J]. 19TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO-2011), 2011, : 1065 - 1069
[9] Audio-visual integration for speech recognition
Kober, R
Harz, U
[J]. NEUROLOGY PSYCHIATRY AND BRAIN RESEARCH, 1996, 4 (04) : 179 - 184
[10] Audio-visual speech recognition by speechreading
Zhang, XZ
Mersereau, RM
Clements, MA
[J]. DSP 2002: 14TH INTERNATIONAL CONFERENCE ON DIGITAL SIGNAL PROCESSING PROCEEDINGS, VOLS 1 AND 2, 2002, : 1069 - 1072

← 1 2 3 4 5 →