Comparing Speaker Adaptation Methods for Visual Speech Recognition for Continuous Spanish

被引:0
|
作者
Gimeno-Gomez, David [1 ]
Martinez-Hinarejos, Carlos-D. [1 ]
机构
[1] Univ Politecn Valencia, Pattern Recognit & Human Language Technol Res Ctr, Camino Vera S-N, Valencia 46022, Spain
来源
APPLIED SCIENCES-BASEL | 2023年 / 13卷 / 11期
关键词
visual speech recognition; speaker adaptation; fine-tuning; Adapters; Spanish language; end-to-end architectures;
D O I
10.3390/app13116521
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
Visual speech recognition (VSR) is a challenging task that aims to interpret speech based solely on lip movements. However, although remarkable results have recently been reached in the field, this task remains an open research problem due to different challenges, such as visual ambiguities, the intra-personal variability among speakers, and the complex modeling of silence. Nonetheless, these challenges can be alleviated when the task is approached from a speaker-dependent perspective. Our work focuses on the adaptation of end-to-end VSR systems to a specific speaker. Hence, we propose two different adaptation methods based on the conventional fine-tuning technique, the so-called Adapters. We conduct a comparative study in terms of performance while considering different deployment aspects such as training time and storage cost. Results on the Spanish LIP-RTVE database show that both methods are able to obtain recognition rates comparable to the state of the art, even when only a limited amount of training data is available. Although it incurs a deterioration in performance, the Adapters-based method presents a more scalable and efficient solution, significantly reducing the training time and storage cost by up to 80%.
引用
收藏
页数:16
相关论文
共 50 条
  • [11] Supervised and unsupervised speaker adaptation in large vocabulary continuous speech recognition of Czech
    Cerva, P
    Nouza, J
    TEXT, SPEECH AND DIALOGUE, PROCEEDINGS, 2005, 3658 : 203 - 210
  • [12] Continuous speech recognition using an on-line speaker adaptation method based on automatic speaker clustering
    Zhang, W
    Nakagawa, S
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2003, E86D (03) : 464 - 473
  • [13] Speaker clustering and transformation for speaker adaptation in speech recognition systems
    Padmanabhan, M
    Bahl, LR
    Nahamoo, D
    Picheny, MA
    IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 1998, 6 (01): : 71 - 77
  • [14] SPEAKER ADAPTATION IN A LIMITED SPEECH RECOGNITION SYSTEM
    MAKHOUL, J
    IEEE TRANSACTIONS ON COMPUTERS, 1971, C 20 (09) : 1057 - &
  • [15] DOMAIN AND SPEAKER ADAPTATION FOR CORTANA SPEECH RECOGNITION
    Zhao, Yong
    Li, Jinyu
    Zhang, Shixiong
    Chen, Liping
    Gong, Yifan
    2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5984 - 5988
  • [16] Quick fMLLR for speaker adaptation in speech recognition
    Varadarajan, Balakrishnan
    Povey, Daniel
    Chu, Stephen M.
    2008 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-12, 2008, : 4297 - +
  • [17] Speaker Adaptation on Myanmar Spontaneous Speech Recognition
    Naing, Hay Mar Soe
    Pa, Win Pa
    COMPUTATIONAL LINGUISTICS, PACLING 2017, 2018, 781 : 303 - 313
  • [18] XMLLR for Improved Speaker Adaptation in Speech Recognition
    Povey, Daniel
    Kuo, Hong-Kwang J.
    INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5, 2008, : 1705 - +
  • [19] Speaker Adaptation Based on PARAFAC2 of Transformation Matrices for Continuous Speech Recognition
    Jeong, Yongwon
    Lim, Sangjun
    Kim, Young Kuk
    Kim, Hyung Soon
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2013, E96D (09) : 2152 - 2155
  • [20] An Acoustic-Phonetic-Based Speaker Adaptation Technique for Improving Speaker-Independent Continuous Speech Recognition
    Zhao, Yunxin
    IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 1994, 2 (03): : 380 - 394