Online Speaker Adaptation of an Acoustic Model Using Face Recognition

被引：0

作者：

Campr, Pavel ^{[1
]}

Prazak, Ales ^{[2
]}

Psutka, Josef V. ^{[2
]}

Psutka, Josef ^{[2
]}

机构：

[1] Czech Tech Univ, Fac Elect Engn, Dept Cybernet, Ctr Machine Percept, Prague 16627 6, Czech Republic

[2] Univ W Bohemia, Fac Sci Appl, Dept Cybernet, Plzen 30614, Czech Republic

来源：

TEXT, SPEECH, AND DIALOGUE, TSD 2013 | 2013年 / 8082卷

关键词：

acoustic model; speaker adaptation; face recognition; multimodal processing; automatic speech recognition;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We have proposed and evaluated a novel approach for online speaker adaptation of an acoustic model based on face recognition. Instead of traditionally used audio-based speaker identification we investigated the video modality for the task of speaker detection. A simulated on-line transcription created by a Large-Vocabulary Continuous Speech Recognition (LVCSR) system for online subtitling is evaluated utilizing speaker independent acoustic models, gender dependent models and models of particular speakers. In the experiment, the speaker dependent acoustic models were trained offline, and are switched online based on the decision of a face recognizer, which reduced Word Error Rate (WER) by 12% relatively compared to speaker independent baseline system.

引用

页码：378 / 385

页数：8

共 50 条

[1] MULTIMODAL SPEAKER ADAPTATION OF ACOUSTIC MODEL AND LANGUAGE MODEL FOR ASR USING SPEAKER FACE EMBEDDING
Moriya, Yasufumi
Jones, Gareth J. F.
2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 8643 - 8647
[2] Speaker-Characterized Emotion Recognition using Online and Iterative Speaker Adaptation
Jae-Bok Kim
Jeong-Sik Park
Yung-Hwan Oh
Cognitive Computation, 2012, 4 : 398 - 408
[3] Speaker-Characterized Emotion Recognition using Online and Iterative Speaker Adaptation
Kim, Jae-Bok
Park, Jeong-Sik
Oh, Yung-Hwan
COGNITIVE COMPUTATION, 2012, 4 (04) : 398 - 408
[4] Rapid online adaptation using speaker space model evolution
Kim, DK
Kim, NS
SPEECH COMMUNICATION, 2004, 42 (3-4) : 467 - 478
[5] Online Speaker Adaptation Using Memory-Aware Networks for Speech Recognition
Pan, Jia
Wan, Genshun
Du, Jun
Ye, Zhongfu
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2020, 28 (28) : 1025 - 1037
[6] Discriminative acoustic model using eigenspace mapping for rapid speaker adaptation
Zhou, BW
Hansen, JHL
2003 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL I, PROCEEDINGS: SPEECH PROCESSING I, 2003, : 308 - 311
[7] Combination of Acoustic and Lexical Speaker Adaptation for Disordered Speech Recognition
Saz, Oscar
Lleida, Eduardo
Miguel, Antonio
INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 540 - 543
[8] Speaker Recognition using Speaker-independent Universal Acoustic Model and Synchronous Sensing for "Business Microscope"
Nishimura, Jun
Kuroda, Tadahiro
ISWPC: 2009 4TH INTERNATIONAL SYMPOSIUM ON WIRELESS PERVASIVE COMPUTING, 2009, : 304 - 308
[9] CYCLE-GANS FOR DOMAIN ADAPTATION OF ACOUSTIC FEATURES FOR SPEAKER RECOGNITION
Nidadavolu, Phani Sankar
Villalba, Jesus
Dehak, Najim
2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 6206 - 6210
[10] Adaptation of hidden Markov model for telephone speech recognition and speaker adaptation
Natl Tsing Hua Univ, Hsinchu, Taiwan
IEE Proc Vision Image Signal Proc, 3 (129-135):

← 1 2 3 4 5 →