Audio-visual speech recognition techniques in augmented reality environments

被引:15
|
作者
Mirzaei, Mohammad Reza [1 ]
Ghorshi, Seyed [1 ]
Mortazavi, Mohammad [1 ]
机构
[1] Sharif Univ Technol, Sch Sci & Engn, Kish Isl, Iran
来源
VISUAL COMPUTER | 2014年 / 30卷 / 03期
关键词
Augmented reality; Audio-visual speech recognition; Augmented reality environments; Communication; Deaf people; VIRTUAL-REALITY;
D O I
10.1007/s00371-013-0841-1
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Many recent studies show that Augmented Reality (AR) and Automatic Speech Recognition (ASR) technologies can be used to help people with disabilities. Many of these studies have been performed only in their specialized field. Audio-Visual Speech Recognition (AVSR) is one of the advances in ASR technology that combines audio, video, and facial expressions to capture a narrator's voice. In this paper, we combine AR and AVSR technologies to make a new system to help deaf and hard-of-hearing people. Our proposed system can take a narrator's speech instantly and convert it into a readable text and show the text directly on an AR display. Therefore, in this system, deaf people can read the narrator's speech easily. In addition, people do not need to learn sign-language to communicate with deaf people. The evaluation results show that this system has lower word error rate compared to ASR and VSR in different noisy conditions. Furthermore, the results of using AVSR techniques show that the recognition accuracy of the system has been improved in noisy places. Also, the results of a survey that was conducted with 100 deaf people show that more than 80 % of deaf people are very interested in using our system as an assistant in portable devices to communicate with people.
引用
收藏
页码:245 / 257
页数:13
相关论文
共 50 条
  • [1] Audio-visual speech recognition techniques in augmented reality environments
    Mohammad Reza Mirzaei
    Seyed Ghorshi
    Mohammad Mortazavi
    [J]. The Visual Computer, 2014, 30 : 245 - 257
  • [2] Audio-Visual Speech Recognition in Noisy Audio Environments
    Palecek, Karel
    Chaloupka, Josef
    [J]. 2013 36TH INTERNATIONAL CONFERENCE ON TELECOMMUNICATIONS AND SIGNAL PROCESSING (TSP), 2013, : 484 - 487
  • [3] Information Fusion Techniques in Audio-Visual Speech Recognition
    Karabalkan, H.
    Erdogan, H.
    [J]. 2009 IEEE 17TH SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE, VOLS 1 AND 2, 2009, : 734 - 737
  • [4] Audio-visual speech recognition based on joint training with audio-visual speech enhancement for robust speech recognition
    Hwang, Jung-Wook
    Park, Jeongkyun
    Park, Rae-Hong
    Park, Hyung-Min
    [J]. APPLIED ACOUSTICS, 2023, 211
  • [5] A Comparison of Model Validation Techniques for Audio-Visual Speech Recognition
    Seong, Thum Wei
    Ibrahim, Mohd Zamri
    Arshad, Nurul Wahidah Binti
    Mulvaney, D. J.
    [J]. IT CONVERGENCE AND SECURITY 2017, VOL 1, 2018, 449 : 112 - 119
  • [6] An audio-visual speech recognition with a new mandarin audio-visual database
    Liao, Wen-Yuan
    Pao, Tsang-Long
    Chen, Yu-Te
    Chang, Tsun-Wei
    [J]. INT CONF ON CYBERNETICS AND INFORMATION TECHNOLOGIES, SYSTEMS AND APPLICATIONS/INT CONF ON COMPUTING, COMMUNICATIONS AND CONTROL TECHNOLOGIES, VOL 1, 2007, : 19 - +
  • [7] Deep Audio-Visual Speech Recognition
    Afouras, Triantafyllos
    Chung, Joon Son
    Senior, Andrew
    Vinyals, Oriol
    Zisserman, Andrew
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2022, 44 (12) : 8717 - 8727
  • [8] MULTIPOSE AUDIO-VISUAL SPEECH RECOGNITION
    Estellers, Virginia
    Thiran, Jean-Philippe
    [J]. 19TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO-2011), 2011, : 1065 - 1069
  • [9] Audio-visual integration for speech recognition
    Kober, R
    Harz, U
    [J]. NEUROLOGY PSYCHIATRY AND BRAIN RESEARCH, 1996, 4 (04) : 179 - 184
  • [10] Audio-visual speech recognition by speechreading
    Zhang, XZ
    Mersereau, RM
    Clements, MA
    [J]. DSP 2002: 14TH INTERNATIONAL CONFERENCE ON DIGITAL SIGNAL PROCESSING PROCEEDINGS, VOLS 1 AND 2, 2002, : 1069 - 1072