Replacing Speaker-independent Recognition Task with Speaker-dependent Task for Lip-reading Using First Order Motion Model

被引：1

作者：

Kodama, Michinari ^{[1
]}

Saitoh, Takeshi ^{[1
]}

机构：

[1] Kyushu Inst Technol, Kitakyushu, Fukuoka, Japan

来源：

THIRTEENTH INTERNATIONAL CONFERENCE ON GRAPHICS AND IMAGE PROCESSING (ICGIP 2021) | 2022年 / 12083卷

关键词：

Lip-reading; first order motion model; speaker-dependent recognition; speaker-independent recognition;

D O I：

10.1117/12.2623640

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

There is a tendency to deal with a speaker-independent recognition task in the lip-reading field by collecting speech scenes from many speakers. The data collection task is time-consuming. This paper proposes a method to solve this problem. According to a driving video, First Order Motion Model (FOMM) is a deep generative model that generates a video sequence from a source image. Our idea is to apply FOMM to all speech scenes in the dataset to generate the speech scenes recording from one speaker. We propose a preprocessing method to replace the speaker-independent recognition task with the speaker-dependent recognition task by applying FOMM. We applied the proposed method to two publicly available databases: OuluVS and CUAVE, and confirmed that the recognition accuracy was improved by applying the proposed method to both databases.

引用

页数：8

共 11 条

[1] On Speaker-Independent, Speaker-Dependent, and Speaker-Adaptive Speech Recognition
Huang, Xuedong
Lee, Kai-Fu
[J]. IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 1993, 1 (02): : 150 - 157
[2] EVALUATION OF ASR FRONT ENDS IN SPEAKER-DEPENDENT AND SPEAKER-INDEPENDENT RECOGNITION
JUNQUA, JC
[J]. JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1987, 81 : S93 - S93
[3] SPEAKER-DEPENDENT ISOLATED WORD RECOGNITION USING SPEAKER-INDEPENDENT VECTOR QUANTIZATION CODEBOOKS AUGMENTED WITH SPEAKER-SPECIFIC DATA
BURTON, DK
SHORE, JE
[J]. IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1985, 33 (02): : 440 - 443
[4] Speaker Recognition using Speaker-independent Universal Acoustic Model and Synchronous Sensing for "Business Microscope"
Nishimura, Jun
Kuroda, Tadahiro
[J]. ISWPC: 2009 4TH INTERNATIONAL SYMPOSIUM ON WIRELESS PERVASIVE COMPUTING, 2009, : 304 - 308
[5] On the improvements of speaker-independent isolated word recognition using chaotic model
Barbashov, OG
Fradkov, AL
Maleev, OG
Romashov, NA
Yushmanov, DA
[J]. CONTROL OF OSCILLATIONS AND CHAOS - 1997 1ST INTERNATIONAL CONFERENCE, PROCEEDINGS, VOLS 1-3, 1997, : 142 - 143
[6] Speaker-independent Thai polysyllabic word recognition using hidden Markov model
Ahkuputra, V
Jitapunkul, S
Pornsukchandra, W
Luksaneeyanawin, S
[J]. 1997 IEEE PACIFIC RIM CONFERENCE ON COMMUNICATIONS, COMPUTERS AND SIGNAL PROCESSING, VOLS 1 AND 2: PACRIM 10 YEARS - 1987-1997, 1997, : 593 - 599
[7] Acoustic Model Training Using Pseudo-Speaker Features Generated by MLLR Transformations for Robust Speaker-Independent Speech Recognition
Itoh, Arata
Hara, Sunao
Kitaoka, Norihide
Takeda, Kazuya
[J]. IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2012, E95D (10): : 2479 - 2485
[8] SPEAKER-INDEPENDENT CONTINUOUS SPEECH RECOGNITION USING FUZZY PARTITION MODEL (FPM) AND LR PARSERS
FUKAZAWA, K
KATO, Y
SUGIYAMA, M
[J]. SYSTEMS AND COMPUTERS IN JAPAN, 1994, 25 (14) : 32 - 48
[9] Text Dependent and Independent Speaker Recognition Using Neural Responses from the Model of the Auditory System
Chowdhury, Shoumya
Mamun, Nursadul
Khan, Ainul Anam Shahjamal
Ahmed, Fahim
[J]. 2017 INTERNATIONAL CONFERENCE ON ELECTRICAL, COMPUTER AND COMMUNICATION ENGINEERING (ECCE), 2017, : 871 - 874
[10] A 2-PASS HYBRID SYSTEM USING A LOW DIMENSIONAL AUDITORY MODEL FOR SPEAKER-INDEPENDENT ISOLATED-WORD RECOGNITION
JUNQUA, JC
[J]. SPEECH COMMUNICATION, 1991, 10 (01) : 33 - 44

← 1 2 →