Survey on automatic lip-reading in the era of deep learning

被引:71
|
作者
Fernandez-Lopez, Adriana [1 ]
Sukno, Federico M. [1 ]
机构
[1] Univ Pompeu Fabra, Dept Informat & Commun Technol, Barcelona, Spain
基金
欧盟地平线“2020”;
关键词
Automatic lip-reading; Audio-visual corpora; Visual speech decoding; Deep learning systems; Multi-view lip-reading; AUDIOVISUAL SPEECH RECOGNITION; ACTIVE APPEARANCE MODELS; FEATURE-EXTRACTION; DATABASE; FEATURES;
D O I
10.1016/j.imavis.2018.07.002
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In the last few years, there has been an increasing interest in developing systems for Automatic Lip-Reading (ALR). Similarly to other computer vision applications, methods based on Deep Learning (DL) have become very popular and have permitted to substantially push forward the achievable performance. In this survey, we review ALR research during the last decade, highlighting the progression from approaches previous to DL (which we refer to as traditional) toward end-to-end DL architectures. We provide a comprehensive list of the audio-visual databases available for lip-reading, describing what tasks they can be used for, their popularity and their most important characteristics, such as the number of speakers, vocabulary size, recording settings and total duration. In correspondence with the shift toward DL, we show that there is a clear tendency toward large-scale datasets targeting realistic application settings and large numbers of samples per class. On the other hand, we summarize, discuss and compare the different ALR systems proposed in the last decade, separately considering traditional and DL approaches. We address a quantitative analysis of the different systems by organizing them in terms of the task that they target (e.g. recognition of letters or digits and words or sentences) and comparing their reported performance in the most commonly used datasets. As a result, we find that DL architectures perform similarly to traditional ones for simpler tasks but report significant improvements in more complex tasks, such as word or sentence recognition, with up to 40% improvement in word recognition rates. Hence, we provide a detailed description of the available ALR systems based on end-to-end DL architectures and identify a tendency to focus on the modeling of temporal context as the key to advance the field. Such modeling is dominated by recurrent neural networks due to their ability to retain context at multiple scales (e.g. short- and long-term information). In this sense, current efforts tend toward techniques that allow a more comprehensive modeling and interpretability of the retained context. (C) 2018 Elsevier B.V. All rights reserved.
引用
收藏
页码:53 / 72
页数:20
相关论文
共 50 条
  • [1] Deep Learning-Based Automated Lip-Reading: A Survey
    Fenghour, Souheil
    Chen, Daqing
    Guo, Kun
    Li, Bo
    Xiao, Perry
    [J]. IEEE ACCESS, 2021, 9 : 121184 - 121205
  • [2] Deep Lip Reading - A Deep Learning Based Lip-Reading Software for the Hearing Impaired
    Abrar, Mohammed Abid
    Islam, A. N. M. Nafiul
    Hassan, Mohammad Muntasir
    Islam, Mohammad Tariqul
    Shahnaz, Celia
    Fattah, Shaikh Anowarul
    [J]. PROCEEDINGS OF 2019 IEEE R10 HUMANITARIAN TECHNOLOGY CONFERENCE (IEEE R10 HTC 2019), 2019, : 40 - 44
  • [3] EXPERIENCES IN LEARNING LIP-READING
    Trask, Alice N.
    [J]. VOLTA REVIEW, 1916, 18 (07) : 297 - 299
  • [4] Automatic lip localization and feature extraction for lip-reading
    Werda, Salah
    Mahdi, Walid
    Ben Hamadou, Abdehnajid
    [J]. VISAPP 2007: PROCEEDINGS OF THE SECOND INTERNATIONAL CONFERENCE ON COMPUTER VISION THEORY AND APPLICATIONS, VOLUME IU/MTSV, 2007, : 268 - +
  • [5] Application of deep learning in Mandarin Chinese lip-reading recognition
    Xing, Guangxin
    Han, Lingkun
    Zheng, Yelong
    Zhao, Meirong
    [J]. EURASIP JOURNAL ON WIRELESS COMMUNICATIONS AND NETWORKING, 2023, 2023 (01)
  • [6] Application of deep learning in Mandarin Chinese lip-reading recognition
    Guangxin Xing
    Lingkun Han
    Yelong Zheng
    Meirong Zhao
    [J]. EURASIP Journal on Wireless Communications and Networking, 2023
  • [7] Lip-Reading Driven Deep Learning Approach for Speech Enhancement
    Adeel, Ahsan
    Gogate, Mandar
    Hussain, Amir
    Whitmer, William M.
    [J]. IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE, 2021, 5 (03): : 481 - 490
  • [8] AUTOMATIC LIP-READING OF HEARING IMPAIRED PEOPLE
    Ivanko, D.
    Ryumin, D.
    Karpov, A.
    [J]. INTERNATIONAL WORKSHOP ON PHOTOGRAMMETRIC AND COMPUTER VISION TECHNIQUES FOR VIDEO SURVEILLANCE, BIOMETRICS AND BIOMEDICINE, 2019, 42-2 (W12): : 97 - 101
  • [9] LIP-READING
    Lindquist, Ida P.
    [J]. VOLTA REVIEW, 1917, 19 (04) : 188 - 188
  • [10] LIP-READING
    Naber, Joseph E.
    [J]. VOLTA REVIEW, 1920, 22 (08) : 527 - 528