Comparing Speaker Adaptation Methods for Visual Speech Recognition for Continuous Spanish

被引:0
|
作者
Gimeno-Gomez, David [1 ]
Martinez-Hinarejos, Carlos-D. [1 ]
机构
[1] Univ Politecn Valencia, Pattern Recognit & Human Language Technol Res Ctr, Camino Vera S-N, Valencia 46022, Spain
来源
APPLIED SCIENCES-BASEL | 2023年 / 13卷 / 11期
关键词
visual speech recognition; speaker adaptation; fine-tuning; Adapters; Spanish language; end-to-end architectures;
D O I
10.3390/app13116521
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
Visual speech recognition (VSR) is a challenging task that aims to interpret speech based solely on lip movements. However, although remarkable results have recently been reached in the field, this task remains an open research problem due to different challenges, such as visual ambiguities, the intra-personal variability among speakers, and the complex modeling of silence. Nonetheless, these challenges can be alleviated when the task is approached from a speaker-dependent perspective. Our work focuses on the adaptation of end-to-end VSR systems to a specific speaker. Hence, we propose two different adaptation methods based on the conventional fine-tuning technique, the so-called Adapters. We conduct a comparative study in terms of performance while considering different deployment aspects such as training time and storage cost. Results on the Spanish LIP-RTVE database show that both methods are able to obtain recognition rates comparable to the state of the art, even when only a limited amount of training data is available. Although it incurs a deterioration in performance, the Adapters-based method presents a more scalable and efficient solution, significantly reducing the training time and storage cost by up to 80%.
引用
收藏
页数:16
相关论文
共 50 条
  • [31] Channel Robust MFCCs for Continuous Speech Speaker Recognition
    Chougule, Sharada Vikram
    Chavan, Mahesh S.
    ADVANCES IN SIGNAL PROCESSING AND INTELLIGENT RECOGNITION SYSTEMS, 2014, 264 : 557 - 568
  • [32] Shape Feature Analysis for Visual Speech and Speaker Recognition
    Gui, Jiaping
    Wang, Shilin
    2010 THE 3RD INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND INDUSTRIAL APPLICATION (PACIIA2010), VOL III, 2010, : 81 - 84
  • [33] RECOGNITION OF SPEAKER-DEPENDENT CONTINUOUS SPEECH WITH KEAL
    MERCIER, G
    BIGORGNE, D
    MICLET, L
    LEGUENNEC, L
    QUERRE, M
    IEE PROCEEDINGS-I COMMUNICATIONS SPEECH AND VISION, 1989, 136 (02): : 145 - 154
  • [34] Unified System for Visual Speech Recognition and Speaker Identification
    Rekik, Ahmed
    Ben-Hamadou, Achraf
    Mahdi, Walid
    ADVANCED CONCEPTS FOR INTELLIGENT VISION SYSTEMS, ACIVS 2015, 2015, 9386 : 381 - 390
  • [35] Shape Feature Analysis for Visual Speech and Speaker Recognition
    Gui, Jiaping
    Wang, Shilin
    APPLIED INFORMATICS AND COMMUNICATION, PT III, 2011, 226 : 167 - 174
  • [36] SPEAKER-CONSISTENT PARSING FOR SPEAKER-INDEPENDENT CONTINUOUS SPEECH RECOGNITION
    YAMAGUCHI, K
    SINGER, H
    MATSUNAGA, S
    SAGAYAMA, S
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 1995, E78D (06) : 719 - 724
  • [37] COMPARING HUMAN AND AUTOMATED SPEAKER RECOGNITION IN CASE OF IMITATED SPEECH
    Delic, Tijana
    Duric, Simona
    Josic, Slobodan
    2015 23RD TELECOMMUNICATIONS FORUM TELFOR (TELFOR), 2015, : 425 - 428
  • [38] Hermitian Polynomial for Speaker Adaptation of Connectionist Speech Recognition Systems
    Siniscalchi, Sabato Marco
    Li, Jinyu
    Lee, Chin-Hui
    IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2013, 21 (10): : 2152 - 2161
  • [39] Speaker adaptation of fuzzy-perceptron-based speech recognition
    Lin, CT
    Nein, HW
    Lin, WF
    INTERNATIONAL JOURNAL OF UNCERTAINTY FUZZINESS AND KNOWLEDGE-BASED SYSTEMS, 1999, 7 (01) : 1 - 30
  • [40] Speaker adaptation for hybrid MMI/connectionist speech recognition systems
    Rottland, J
    Neukirchen, C
    Rigoll, G
    PROCEEDINGS OF THE 1998 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-6, 1998, : 465 - 468