Replacing Speaker-independent Recognition Task with Speaker-dependent Task for Lip-reading Using First Order Motion Model

被引:1
|
作者
Kodama, Michinari [1 ]
Saitoh, Takeshi [1 ]
机构
[1] Kyushu Inst Technol, Kitakyushu, Fukuoka, Japan
关键词
Lip-reading; first order motion model; speaker-dependent recognition; speaker-independent recognition;
D O I
10.1117/12.2623640
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
There is a tendency to deal with a speaker-independent recognition task in the lip-reading field by collecting speech scenes from many speakers. The data collection task is time-consuming. This paper proposes a method to solve this problem. According to a driving video, First Order Motion Model (FOMM) is a deep generative model that generates a video sequence from a source image. Our idea is to apply FOMM to all speech scenes in the dataset to generate the speech scenes recording from one speaker. We propose a preprocessing method to replace the speaker-independent recognition task with the speaker-dependent recognition task by applying FOMM. We applied the proposed method to two publicly available databases: OuluVS and CUAVE, and confirmed that the recognition accuracy was improved by applying the proposed method to both databases.
引用
收藏
页数:8
相关论文
共 11 条
  • [1] On Speaker-Independent, Speaker-Dependent, and Speaker-Adaptive Speech Recognition
    Huang, Xuedong
    Lee, Kai-Fu
    [J]. IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 1993, 1 (02): : 150 - 157
  • [2] EVALUATION OF ASR FRONT ENDS IN SPEAKER-DEPENDENT AND SPEAKER-INDEPENDENT RECOGNITION
    JUNQUA, JC
    [J]. JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1987, 81 : S93 - S93
  • [3] SPEAKER-DEPENDENT ISOLATED WORD RECOGNITION USING SPEAKER-INDEPENDENT VECTOR QUANTIZATION CODEBOOKS AUGMENTED WITH SPEAKER-SPECIFIC DATA
    BURTON, DK
    SHORE, JE
    [J]. IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1985, 33 (02): : 440 - 443
  • [4] Speaker Recognition using Speaker-independent Universal Acoustic Model and Synchronous Sensing for "Business Microscope"
    Nishimura, Jun
    Kuroda, Tadahiro
    [J]. ISWPC: 2009 4TH INTERNATIONAL SYMPOSIUM ON WIRELESS PERVASIVE COMPUTING, 2009, : 304 - 308
  • [5] On the improvements of speaker-independent isolated word recognition using chaotic model
    Barbashov, OG
    Fradkov, AL
    Maleev, OG
    Romashov, NA
    Yushmanov, DA
    [J]. CONTROL OF OSCILLATIONS AND CHAOS - 1997 1ST INTERNATIONAL CONFERENCE, PROCEEDINGS, VOLS 1-3, 1997, : 142 - 143
  • [6] Speaker-independent Thai polysyllabic word recognition using hidden Markov model
    Ahkuputra, V
    Jitapunkul, S
    Pornsukchandra, W
    Luksaneeyanawin, S
    [J]. 1997 IEEE PACIFIC RIM CONFERENCE ON COMMUNICATIONS, COMPUTERS AND SIGNAL PROCESSING, VOLS 1 AND 2: PACRIM 10 YEARS - 1987-1997, 1997, : 593 - 599
  • [7] Acoustic Model Training Using Pseudo-Speaker Features Generated by MLLR Transformations for Robust Speaker-Independent Speech Recognition
    Itoh, Arata
    Hara, Sunao
    Kitaoka, Norihide
    Takeda, Kazuya
    [J]. IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2012, E95D (10): : 2479 - 2485
  • [8] SPEAKER-INDEPENDENT CONTINUOUS SPEECH RECOGNITION USING FUZZY PARTITION MODEL (FPM) AND LR PARSERS
    FUKAZAWA, K
    KATO, Y
    SUGIYAMA, M
    [J]. SYSTEMS AND COMPUTERS IN JAPAN, 1994, 25 (14) : 32 - 48
  • [9] Text Dependent and Independent Speaker Recognition Using Neural Responses from the Model of the Auditory System
    Chowdhury, Shoumya
    Mamun, Nursadul
    Khan, Ainul Anam Shahjamal
    Ahmed, Fahim
    [J]. 2017 INTERNATIONAL CONFERENCE ON ELECTRICAL, COMPUTER AND COMMUNICATION ENGINEERING (ECCE), 2017, : 871 - 874
  • [10] A 2-PASS HYBRID SYSTEM USING A LOW DIMENSIONAL AUDITORY MODEL FOR SPEAKER-INDEPENDENT ISOLATED-WORD RECOGNITION
    JUNQUA, JC
    [J]. SPEECH COMMUNICATION, 1991, 10 (01) : 33 - 44