Visual Speech Recognition: Improving Speech Perception in Noise through Artificial Intelligence

被引：4

作者：

Raghavan, Arun M. ^{[1
]}

Lipschitz, Noga ^{[2
]}

Breen, Joseph T. ^{[2
]}

Samy, Ravi N. ^{[2
]}

Kohlberg, Gavriel D. ^{[2
,3
]}

机构：

[1] Univ Cincinnati, Coll Med, Cincinnati, OH USA

[2] Univ Cincinnati, Cincinnati Childrens Hosp Med Ctr, Dept Otolaryngol Head & Neck Surg, Cincinnati, OH USA

[3] Univ Washington, Dept Otolaryngol Head & Neck Surg, Seattle, WA 98195 USA

来源：

OTOLARYNGOLOGY-HEAD AND NECK SURGERY | 2020年 / 163卷 / 04期

关键词：

artificial intelligence; speech-in-noise; hearing loss; speech perception; visual speech recognition; computer vision; lip reading; DIRECTIONAL MICROPHONES; HEARING-AIDS; CHILDREN; REDUCTION; LIFE;

D O I：

10.1177/0194599820924331

中图分类号：

R76 [耳鼻咽喉科学];

学科分类号：

100213 ;

摘要：

Objectives To compare speech perception (SP) in noise for normal-hearing (NH) individuals and individuals with hearing loss (IWHL) and to demonstrate improvements in SP with use of a visual speech recognition program (VSRP). Study Design Single-institution prospective study. Setting Tertiary referral center. Subjects and Methods Eleven NH and 9 IWHL participants in a sound-isolated booth facing a speaker through a window. In non-VSRP conditions, SP was evaluated on 40 Bamford-Kowal-Bench speech-in-noise test (BKB-SIN) sentences presented by the speaker at 50 A-weighted decibels (dBA) with multiperson babble noise presented from 50 to 75 dBA. SP was defined as the percentage of words correctly identified. In VSRP conditions, an infrared camera was used to track 35 points around the speaker's lips during speech in real time. Lip movement data were translated into speech-text via an in-house developed neural network-based VSRP. SP was evaluated similarly in the non-VSRP condition on 42 BKB-SIN sentences, with the addition of the VSRP output presented on a screen to the listener. Results In high-noise conditions (70-75 dBA) without VSRP, NH listeners achieved significantly higher speech perception than IWHL listeners (38.7% vs 25.0%, P = .02). NH listeners were significantly more accurate with VSRP than without VSRP (75.5% vs 38.7%, P < .0001), as were IWHL listeners (70.4% vs 25.0% P < .0001). With VSRP, no significant difference in SP was observed between NH and IWHL listeners (75.5% vs 70.4%, P = .15). Conclusions The VSRP significantly increased speech perception in high-noise conditions for NH and IWHL participants and eliminated the difference in SP accuracy between NH and IWHL listeners.

引用

页码：771 / 777

页数：7

共 50 条

[1] Visual Hearing Aids: Artificial Visual Speech Stimuli for Audiovisual Speech Perception in Noise
Choudhary, Zubin Datta
Bruder, Gerd
Welch, Gregory F.
[J]. 29TH ACM SYMPOSIUM ON VIRTUAL REALITY SOFTWARE AND TECHNOLOGY, VRST 2023, 2023,
[2] Improving visual noise insensitivity in small vocabulary audio visual speech recognition applications
Lucey, S
Sridharan, S
Chandran, H
[J]. ISSPA 2001: SIXTH INTERNATIONAL SYMPOSIUM ON SIGNAL PROCESSING AND ITS APPLICATIONS, VOLS 1 AND 2, PROCEEDINGS, 2001, : 434 - 437
[3] English Speech Recognition Based on Artificial Intelligence
Bai, Tana
[J]. AGRO FOOD INDUSTRY HI-TECH, 2017, 28 (03): : 2259 - 2263
[4] An Approach to Authenticity Speech Validation Through Facial Recognition and Artificial Intelligence Techniques
Faria, Hugo
Rodrigues, Manuel
Novais, Paulo
[J]. INTELLIGENT DATA ENGINEERING AND AUTOMATED LEARNING - IDEAL 2022, 2022, 13756 : 54 - 63
[5] Enhancing speech perception in noise through articulation
Perron, Maxime
Liu, Qiying
Tremblay, Pascale
Alain, Claude
[J]. ANNALS OF THE NEW YORK ACADEMY OF SCIENCES, 2024, 1537 (01) : 140 - 154
[6] An audio-visual corpus for speech perception and automatic speech recognition (L)
Cooke, Martin
Barker, Jon
Cunningham, Stuart
Shao, Xu
[J]. JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2006, 120 (05): : 2421 - 2424
[7] Training Programs for Improving Speech Perception in Noise: A Review
Gohari, Nasrin
Dastgerdi, Zahra Hosseini
Rouhbakhsh, Nematollah
Afshar, Sara
Mobini, Razieh
[J]. JOURNAL OF AUDIOLOGY AND OTOLOGY, 2023, 27 (01): : 1 - 9
[8] Improving Speech Perception in Noise for Children with Cochlear Implants
Gifford, Rene H.
Olund, Amy P.
DeJong, Melissa
[J]. JOURNAL OF THE AMERICAN ACADEMY OF AUDIOLOGY, 2011, 22 (09) : 623 - 632
[9] Artificial Intelligence Speech Recognition System using MATLAB
Srujana, K.
Ramesh, R.
Kiran, G.
Manikanta, Ch
[J]. 2017 INTERNATIONAL CONFERENCE ON CURRENT TRENDS IN COMPUTER, ELECTRICAL, ELECTRONICS AND COMMUNICATION (CTCEEC), 2017, : 92 - 98
[10] AN ARTIFICIAL-INTELLIGENCE APPROACH TO SPEECH RECOGNITION AND UNDERSTANDING
STRINGA, L
[J]. PATTERN RECOGNITION LETTERS, 1988, 8 (01) : 39 - 45

← 1 2 3 4 5 →