Visual units and confusion modelling for automatic lip-reading

被引:22
|
作者
Howell, Dominic [1 ]
Cox, Stephen [1 ]
Theobald, Barry [1 ]
机构
[1] Univ E Anglia, Sch Comp Sci, Norwich NR4 7TJ, Norfolk, England
关键词
Lip-reading; Speech recognition; Visemes; Weighted finite state transducers; Confusion matrices; Confusion modelling; ROBUST SPEECH RECOGNITION; FINITE-STATE TRANSDUCERS; AUDIOVISUAL SPEECH;
D O I
10.1016/j.imavis.2016.03.003
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Automatic lip-reading (ALR) is a challenging task because the visual speech signal is known to be missing some important information, such as voicing. We propose an approach to ALR that acknowledges that this information is missing but assumes that it is substituted or deleted in a systematic way that can be modelled. We describe a system that learns such a model and then incorporates it into decoding, which is realised as a cascade of weighted finite-state transducers. Our results show a small but statistically significant improvement in recognition accuracy. We also investigate the issue of suitable visual units for ALR, and show that visemes are sub-optimal, not but because they introduce lexical ambiguity, but because the reduction in modelling units entailed by their use reduces accuracy. (C) 2016 Elsevier B.V. All rights reserved.
引用
收藏
页码:1 / 12
页数:12
相关论文
共 50 条
  • [1] Visual speech features representation for automatic lip-reading
    Sagheer, A
    Tsuruta, N
    Taniguchi, RK
    Maeda, S
    [J]. 2005 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1-5: SPEECH PROCESSING, 2005, : 781 - 784
  • [2] Visual words for lip-reading
    Hassanat, Ahmad B. A.
    Jassim, Sabah
    [J]. MOBILE MULTIMEDIA/IMAGE PROCESSING, SECURITY, AND APPLICATIONS 2010, 2010, 7708
  • [3] Automatic lip localization and feature extraction for lip-reading
    Werda, Salah
    Mahdi, Walid
    Ben Hamadou, Abdehnajid
    [J]. VISAPP 2007: PROCEEDINGS OF THE SECOND INTERNATIONAL CONFERENCE ON COMPUTER VISION THEORY AND APPLICATIONS, VOLUME IU/MTSV, 2007, : 268 - +
  • [4] Visual-speech-pass filtering for robust automatic lip-reading
    Jong-Seok Lee
    [J]. Pattern Analysis and Applications, 2014, 17 : 611 - 621
  • [5] Visual-speech-pass filtering for robust automatic lip-reading
    Lee, Jong-Seok
    [J]. PATTERN ANALYSIS AND APPLICATIONS, 2014, 17 (03) : 611 - 621
  • [6] Method for visual analysis of driver's face for automatic lip-reading in the wild
    Axyonov, A. A.
    Ryumin, D. A.
    Kashevnik, A. M.
    Ivanko, D., V
    Karpov, A. A.
    [J]. COMPUTER OPTICS, 2022, 46 (06) : 955 - +
  • [7] AUTOMATIC LIP-READING OF HEARING IMPAIRED PEOPLE
    Ivanko, D.
    Ryumin, D.
    Karpov, A.
    [J]. INTERNATIONAL WORKSHOP ON PHOTOGRAMMETRIC AND COMPUTER VISION TECHNIQUES FOR VIDEO SURVEILLANCE, BIOMETRICS AND BIOMEDICINE, 2019, 42-2 (W12): : 97 - 101
  • [8] LIP-READING
    Lindquist, Ida P.
    [J]. VOLTA REVIEW, 1917, 19 (04) : 188 - 188
  • [9] LIP-READING
    Naber, Joseph E.
    [J]. VOLTA REVIEW, 1920, 22 (08) : 527 - 528
  • [10] LIP-READING
    Wilson, Ida H.
    [J]. VOLTA REVIEW, 1920, 22 (04) : 221 - 222