Comparing Speaker Adaptation Methods for Visual Speech Recognition for Continuous Spanish

被引：0

作者：

Gimeno-Gomez, David ^{[1
]}

Martinez-Hinarejos, Carlos-D. ^{[1
]}

机构：

[1] Univ Politecn Valencia, Pattern Recognit & Human Language Technol Res Ctr, Camino Vera S-N, Valencia 46022, Spain

来源：

APPLIED SCIENCES-BASEL | 2023年 / 13卷 / 11期

关键词：

visual speech recognition; speaker adaptation; fine-tuning; Adapters; Spanish language; end-to-end architectures;

D O I：

10.3390/app13116521

中图分类号：

O6 [化学];

学科分类号：

0703 ;

摘要：

Visual speech recognition (VSR) is a challenging task that aims to interpret speech based solely on lip movements. However, although remarkable results have recently been reached in the field, this task remains an open research problem due to different challenges, such as visual ambiguities, the intra-personal variability among speakers, and the complex modeling of silence. Nonetheless, these challenges can be alleviated when the task is approached from a speaker-dependent perspective. Our work focuses on the adaptation of end-to-end VSR systems to a specific speaker. Hence, we propose two different adaptation methods based on the conventional fine-tuning technique, the so-called Adapters. We conduct a comparative study in terms of performance while considering different deployment aspects such as training time and storage cost. Results on the Spanish LIP-RTVE database show that both methods are able to obtain recognition rates comparable to the state of the art, even when only a limited amount of training data is available. Although it incurs a deterioration in performance, the Adapters-based method presents a more scalable and efficient solution, significantly reducing the training time and storage cost by up to 80%.

引用

页数：16

共 50 条

[1] Rapid speaker adaptation for continuous speech recognition
Lu, Ping
Wu, Ji
Wang, Zuoying
Lu, Dajin
Qinghua Daxue Xuebao/Journal of Tsinghua University, 2002, 42 (07): : 977 - 980
[2] A speaker clustering algorithm for fast speaker adaptation in continuous speech recognition
Rodríguez, LJ
Torres, MI
TEXT, SPEECH AND DIALOGUE, PROCEEDINGS, 2004, 3206 : 433 - 440
[3] Speaker adaptation by modeling the speaker variation in a continuous speech recognition system
Strom, N
ICSLP 96 - FOURTH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, VOLS 1-4, 1996, : 989 - 992
[4] Discriminative speaker adaptation in Persian continuous speech recognition systems
Pirhosseinloo, Shadi
Ganj, Farshad Almas
4TH INTERNATIONAL CONFERENCE OF COGNITIVE SCIENCE, 2012, 32 : 296 - 301
[5] Speaker independent audio-visual continuous speech recognition
Liang, LH
Liu, XX
Zhao, YB
Pi, XB
Nefian, AV
IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, VOL I AND II, PROCEEDINGS, 2002, : A25 - A28
[6] Speaker adaptation in the philips system for large vocabulary continuous speech recognition
Thelen, E
Aubert, X
Beyerlein, P
1997 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I - V: VOL I: PLENARY, EXPERT SUMMARIES, SPECIAL, AUDIO, UNDERWATER ACOUSTICS, VLSI; VOL II: SPEECH PROCESSING; VOL III: SPEECH PROCESSING, DIGITAL SIGNAL PROCESSING; VOL IV: MULTIDIMENSIONAL SIGNAL PROCESSING, NEURAL NETWORKS - VOL V: STATISTICAL SIGNAL AND ARRAY PROCESSING, APPLICATIONS, 1997, : 1035 - 1038
[7] PREDICTIVE SPEAKER ADAPTATION IN SPEECH RECOGNITION
COX, S
COMPUTER SPEECH AND LANGUAGE, 1995, 9 (01): : 1 - 17
[8] Rapid Nonlinear Speaker Adaptation for Large-Vocabulary Continuous Speech Recognition
Roupakia, Zoi
Ragni, Anton
Gales, Mark
13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, : 1782 - 1785
[9] Speaker adaptation using probabilistic linear discriminant analysis for continuous speech recognition
Jeong, Y.
ELECTRONICS LETTERS, 2013, 49 (25) : 1641 - 1643
[10] Aging speech recognition with speaker adaptation techniques: Study on medium vocabulary continuous Bengali speech
Das, Biswajit
Mandal, Sandipan
Mitra, Pabitra
Basu, Anupam
PATTERN RECOGNITION LETTERS, 2013, 34 (03) : 335 - 343

← 1 2 3 4 5 →