Visual units and confusion modelling for automatic lip-reading

被引：22

作者：

Howell, Dominic ^{[1
]}

Cox, Stephen ^{[1
]}

Theobald, Barry ^{[1
]}

机构：

[1] Univ E Anglia, Sch Comp Sci, Norwich NR4 7TJ, Norfolk, England

来源：

IMAGE AND VISION COMPUTING | 2016年 / 51卷

关键词：

Lip-reading; Speech recognition; Visemes; Weighted finite state transducers; Confusion matrices; Confusion modelling; ROBUST SPEECH RECOGNITION; FINITE-STATE TRANSDUCERS; AUDIOVISUAL SPEECH;

D O I：

10.1016/j.imavis.2016.03.003

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Automatic lip-reading (ALR) is a challenging task because the visual speech signal is known to be missing some important information, such as voicing. We propose an approach to ALR that acknowledges that this information is missing but assumes that it is substituted or deleted in a systematic way that can be modelled. We describe a system that learns such a model and then incorporates it into decoding, which is realised as a cascade of weighted finite-state transducers. Our results show a small but statistically significant improvement in recognition accuracy. We also investigate the issue of suitable visual units for ALR, and show that visemes are sub-optimal, not but because they introduce lexical ambiguity, but because the reduction in modelling units entailed by their use reduces accuracy. (C) 2016 Elsevier B.V. All rights reserved.

引用

页码：1 / 12

页数：12

共 50 条

[1] Visual speech features representation for automatic lip-reading
Sagheer, A
Tsuruta, N
Taniguchi, RK
Maeda, S
[J]. 2005 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1-5: SPEECH PROCESSING, 2005, : 781 - 784
[2] Visual words for lip-reading
Hassanat, Ahmad B. A.
Jassim, Sabah
[J]. MOBILE MULTIMEDIA/IMAGE PROCESSING, SECURITY, AND APPLICATIONS 2010, 2010, 7708
[3] Automatic lip localization and feature extraction for lip-reading
Werda, Salah
Mahdi, Walid
Ben Hamadou, Abdehnajid
[J]. VISAPP 2007: PROCEEDINGS OF THE SECOND INTERNATIONAL CONFERENCE ON COMPUTER VISION THEORY AND APPLICATIONS, VOLUME IU/MTSV, 2007, : 268 - +
[4] Visual-speech-pass filtering for robust automatic lip-reading
Jong-Seok Lee
[J]. Pattern Analysis and Applications, 2014, 17 : 611 - 621
[5] Visual-speech-pass filtering for robust automatic lip-reading
Lee, Jong-Seok
[J]. PATTERN ANALYSIS AND APPLICATIONS, 2014, 17 (03) : 611 - 621
[6] Method for visual analysis of driver's face for automatic lip-reading in the wild
Axyonov, A. A.
Ryumin, D. A.
Kashevnik, A. M.
Ivanko, D., V
Karpov, A. A.
[J]. COMPUTER OPTICS, 2022, 46 (06) : 955 - +
[7] AUTOMATIC LIP-READING OF HEARING IMPAIRED PEOPLE
Ivanko, D.
Ryumin, D.
Karpov, A.
[J]. INTERNATIONAL WORKSHOP ON PHOTOGRAMMETRIC AND COMPUTER VISION TECHNIQUES FOR VIDEO SURVEILLANCE, BIOMETRICS AND BIOMEDICINE, 2019, 42-2 (W12): : 97 - 101
[8] LIP-READING
Lindquist, Ida P.
[J]. VOLTA REVIEW, 1917, 19 (04) : 188 - 188
[9] LIP-READING
Naber, Joseph E.
[J]. VOLTA REVIEW, 1920, 22 (08) : 527 - 528
[10] LIP-READING
Wilson, Ida H.
[J]. VOLTA REVIEW, 1920, 22 (04) : 221 - 222

← 1 2 3 4 5 →