Speaker-adaptive speech recognition using speaker diarization for improved transcription of large spoken archives

被引:14
|
作者
Cerva, Petr [1 ]
Silovsky, Jan [1 ]
Zdansky, Jindrich [1 ]
Nouza, Jan [1 ]
Seps, Ladislav [1 ]
机构
[1] Tech Univ Liberec, Inst Informat Technol & Elect, Liberec 46117, Czech Republic
关键词
Speaker adaptive; Automatic speech recognition; Speaker adaptation; Speaker diarization; Automatic transcription; Large spoken archives; ADAPTATION; ACCESS;
D O I
10.1016/j.specom.2013.06.017
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
This paper deals with speaker-adaptive speech recognition for large spoken archives. The goal is to improve the recognition accuracy of an automatic speech recognition (ASR) system that is being deployed for transcription of a large archive of Czech radio. This archive represents a significant part of Czech cultural heritage, as it contains recordings covering 90 years of broadcasting. A large portion of these documents (100,000 h) is to be transcribed and made public for browsing. To improve the transcription results, an efficient speaker-adaptive scheme is proposed. The scheme is based on integration of speaker diarization and adaptation methods and is designed to achieve a low Real-Time Factor (RTF) of the entire adaptation process, because the archive's size is enormous. It thus employs just two decoding passes, where the first one is carried out using the lexicon with a reduced number of items. Moreover, the transcripts from the first pass serve not only for adaptation, but also as the input to the speaker diarization module, which employs two-stage clustering. The output of diarization is then utilized for a cluster-based unsupervised Speaker Adaptation (SA) approach that also utilizes information based on the gender of each individual speaker. Presented experimental results on various types of programs show that our adaptation scheme yields a significant Word Error Rate (WER) reduction from 22.24% to 18.85% over the Speaker Independent (SI) system while operating at a reasonable RTF. (c) 2013 Elsevier B.V. All rights reserved.
引用
收藏
页码:1033 / 1046
页数:14
相关论文
共 50 条
  • [31] XMLLR for Improved Speaker Adaptation in Speech Recognition
    Povey, Daniel
    Kuo, Hong-Kwang J.
    INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5, 2008, : 1705 - +
  • [32] A Speaker-Adaptive HMM-based Vietnamese Text-to-Speech System
    Ninh, Duy Khanh
    PROCEEDINGS OF 2019 11TH INTERNATIONAL CONFERENCE ON KNOWLEDGE AND SYSTEMS ENGINEERING (KSE 2019), 2019, : 342 - 346
  • [33] Speaker-adaptive learning of resonance targets in a hidden trajectory model of speech coarticulation
    Yu, Dong
    Deng, Li
    Acero, Alex
    COMPUTER SPEECH AND LANGUAGE, 2007, 21 (01): : 72 - 87
  • [34] Robust Speaker-Adaptive HMM-Based Text-to-Speech Synthesis
    Yamagishi, Junichi
    Nose, Takashi
    Zen, Heiga
    Ling, Zhen-Hua
    Toda, Tomoki
    Tokuda, Keiichi
    King, Simon
    Renals, Steve
    IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2009, 17 (06): : 1208 - 1230
  • [35] Speaker Adaptive Model for Hindi Speech using Kaldi Speech Recognition toolkit
    Upadhyaya, Prashant
    Mittal, Sanjeev Kumar
    Varshney, Yash Vardhan
    Farooq, Omar
    Abidi, Musiur Raza
    PROCEEDINGS OF THE 2017 INTERNATIONAL CONFERENCE ON MULTIMEDIA, SIGNAL PROCESSING AND COMMUNICATION TECHNOLOGIES (IMPACT), 2017, : 222 - 226
  • [36] Analysis of Oral Exams With Speaker Diarization and Speech Emotion Recognition: A Case Study
    Beccaro, Wesley
    Ramirez, Miguel Arjona
    Liaw, William
    Guimaraes, Heitor Rodrigues
    IEEE TRANSACTIONS ON EDUCATION, 2024, 67 (01) : 74 - 86
  • [37] Speaker diarization system on the 2007 NIST rich transcription meeting recognition evaluation
    Sun, Hanwu
    Nwe, Tin Lay
    Chin, Eugene
    Koh, Wei
    Bin, Ma
    Li, Haizhou
    MULTIMEDIA SYSTEMS AND APPLICATIONS X, 2007, 6777
  • [38] Speaker-Independent Silent Speech Recognition with Across-Speaker Articulatory Normalization and Speaker Adaptive Training
    Wang, Jun
    Hahm, Seongjun
    16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 2415 - 2419
  • [39] MSVQ-based speaker-adaptive Chinese syllable recognition based on discriminative training
    Zhou, L
    Imai, S
    INTERNATIONAL JOURNAL OF ADAPTIVE CONTROL AND SIGNAL PROCESSING, 1997, 11 (07) : 569 - 583
  • [40] Speaker Adaptive Classification Procedure for Speech Recognition.
    Katterfeldt, Harald
    Thon, Werner
    1974, 27 (06): : 230 - 232