Online Speaker Adaptation of an Acoustic Model Using Face Recognition

被引:0
|
作者
Campr, Pavel [1 ]
Prazak, Ales [2 ]
Psutka, Josef V. [2 ]
Psutka, Josef [2 ]
机构
[1] Czech Tech Univ, Fac Elect Engn, Dept Cybernet, Ctr Machine Percept, Prague 16627 6, Czech Republic
[2] Univ W Bohemia, Fac Sci Appl, Dept Cybernet, Plzen 30614, Czech Republic
来源
关键词
acoustic model; speaker adaptation; face recognition; multimodal processing; automatic speech recognition;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We have proposed and evaluated a novel approach for online speaker adaptation of an acoustic model based on face recognition. Instead of traditionally used audio-based speaker identification we investigated the video modality for the task of speaker detection. A simulated on-line transcription created by a Large-Vocabulary Continuous Speech Recognition (LVCSR) system for online subtitling is evaluated utilizing speaker independent acoustic models, gender dependent models and models of particular speakers. In the experiment, the speaker dependent acoustic models were trained offline, and are switched online based on the decision of a face recognizer, which reduced Word Error Rate (WER) by 12% relatively compared to speaker independent baseline system.
引用
收藏
页码:378 / 385
页数:8
相关论文
共 50 条
  • [1] MULTIMODAL SPEAKER ADAPTATION OF ACOUSTIC MODEL AND LANGUAGE MODEL FOR ASR USING SPEAKER FACE EMBEDDING
    Moriya, Yasufumi
    Jones, Gareth J. F.
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 8643 - 8647
  • [2] Speaker-Characterized Emotion Recognition using Online and Iterative Speaker Adaptation
    Jae-Bok Kim
    Jeong-Sik Park
    Yung-Hwan Oh
    Cognitive Computation, 2012, 4 : 398 - 408
  • [3] Speaker-Characterized Emotion Recognition using Online and Iterative Speaker Adaptation
    Kim, Jae-Bok
    Park, Jeong-Sik
    Oh, Yung-Hwan
    COGNITIVE COMPUTATION, 2012, 4 (04) : 398 - 408
  • [4] Rapid online adaptation using speaker space model evolution
    Kim, DK
    Kim, NS
    SPEECH COMMUNICATION, 2004, 42 (3-4) : 467 - 478
  • [5] Online Speaker Adaptation Using Memory-Aware Networks for Speech Recognition
    Pan, Jia
    Wan, Genshun
    Du, Jun
    Ye, Zhongfu
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2020, 28 (28) : 1025 - 1037
  • [6] Discriminative acoustic model using eigenspace mapping for rapid speaker adaptation
    Zhou, BW
    Hansen, JHL
    2003 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL I, PROCEEDINGS: SPEECH PROCESSING I, 2003, : 308 - 311
  • [7] Combination of Acoustic and Lexical Speaker Adaptation for Disordered Speech Recognition
    Saz, Oscar
    Lleida, Eduardo
    Miguel, Antonio
    INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 540 - 543
  • [8] Speaker Recognition using Speaker-independent Universal Acoustic Model and Synchronous Sensing for "Business Microscope"
    Nishimura, Jun
    Kuroda, Tadahiro
    ISWPC: 2009 4TH INTERNATIONAL SYMPOSIUM ON WIRELESS PERVASIVE COMPUTING, 2009, : 304 - 308
  • [9] CYCLE-GANS FOR DOMAIN ADAPTATION OF ACOUSTIC FEATURES FOR SPEAKER RECOGNITION
    Nidadavolu, Phani Sankar
    Villalba, Jesus
    Dehak, Najim
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 6206 - 6210
  • [10] Adaptation of hidden Markov model for telephone speech recognition and speaker adaptation
    Natl Tsing Hua Univ, Hsinchu, Taiwan
    IEE Proc Vision Image Signal Proc, 3 (129-135):