Speaker-adaptive speech recognition using speaker diarization for improved transcription of large spoken archives

被引:14
|
作者
Cerva, Petr [1 ]
Silovsky, Jan [1 ]
Zdansky, Jindrich [1 ]
Nouza, Jan [1 ]
Seps, Ladislav [1 ]
机构
[1] Tech Univ Liberec, Inst Informat Technol & Elect, Liberec 46117, Czech Republic
关键词
Speaker adaptive; Automatic speech recognition; Speaker adaptation; Speaker diarization; Automatic transcription; Large spoken archives; ADAPTATION; ACCESS;
D O I
10.1016/j.specom.2013.06.017
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
This paper deals with speaker-adaptive speech recognition for large spoken archives. The goal is to improve the recognition accuracy of an automatic speech recognition (ASR) system that is being deployed for transcription of a large archive of Czech radio. This archive represents a significant part of Czech cultural heritage, as it contains recordings covering 90 years of broadcasting. A large portion of these documents (100,000 h) is to be transcribed and made public for browsing. To improve the transcription results, an efficient speaker-adaptive scheme is proposed. The scheme is based on integration of speaker diarization and adaptation methods and is designed to achieve a low Real-Time Factor (RTF) of the entire adaptation process, because the archive's size is enormous. It thus employs just two decoding passes, where the first one is carried out using the lexicon with a reduced number of items. Moreover, the transcripts from the first pass serve not only for adaptation, but also as the input to the speaker diarization module, which employs two-stage clustering. The output of diarization is then utilized for a cluster-based unsupervised Speaker Adaptation (SA) approach that also utilizes information based on the gender of each individual speaker. Presented experimental results on various types of programs show that our adaptation scheme yields a significant Word Error Rate (WER) reduction from 22.24% to 18.85% over the Speaker Independent (SI) system while operating at a reasonable RTF. (c) 2013 Elsevier B.V. All rights reserved.
引用
下载
收藏
页码:1033 / 1046
页数:14
相关论文
共 50 条
  • [21] Large Vocabulary Speech Recognition: Speaker Dependent and Speaker Independent
    Hemakumar, G.
    Punitha, P.
    INFORMATION SYSTEMS DESIGN AND INTELLIGENT APPLICATIONS, VOL 1, 2015, 339 : 73 - 80
  • [22] AN IMPROVED METHOD FOR SPEECH/SPEAKER RECOGNITION
    Gaafar, Tamer S.
    Bakr, Hitham M. Abo
    Abdalla, Mahmoud I.
    2014 INTERNATIONAL CONFERENCE ON INFORMATICS, ELECTRONICS & VISION (ICIEV), 2014,
  • [23] Methodologies for the evaluation of Speaker Diarization and Automatic Speech Recognition in the presence of overlapping speech
    Galibert, Olivier
    14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 1130 - 1133
  • [24] SIMULTANEOUS SPEECH RECOGNITION AND SPEAKER DIARIZATION FOR MONAURAL DIALOGUE RECORDINGS WITH TARGET-SPEAKER ACOUSTIC MODELS
    Kanda, Naoyuki
    Horiguchi, Shota
    Fujita, Yusuke
    Xue, Yawen
    Nagamatsu, Kenji
    Watanabe, Shinji
    2019 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU 2019), 2019, : 31 - 38
  • [25] Speech refinement using Bi-LSTM and improved spectral clustering in speaker diarization
    Aishwarya Gupta
    Archana Purwar
    Multimedia Tools and Applications, 2024, 83 : 54433 - 54448
  • [26] Joint speaker diarization and speech recognition based on region proposal networks
    Huang, Zili
    Delcroix, Marc
    Garcia, Leibny Paola
    Watanabe, Shinji
    Raj, Desh
    Khudanpur, Sanjeev
    COMPUTER SPEECH AND LANGUAGE, 2022, 72
  • [27] Speech refinement using Bi-LSTM and improved spectral clustering in speaker diarization
    Gupta, Aishwarya
    Purwar, Archana
    MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 83 (18) : 54433 - 54448
  • [28] Speaker adaptive speech recognition using phone pair model
    Li, BJ
    Hirose, K
    2000 5TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING PROCEEDINGS, VOLS I-III, 2000, : 714 - 717
  • [29] Fast Speaker Adaptive Training for Speech Recognition
    Povey, Daniel
    Kuo, Hong-Kwang J.
    Soltau, Hagen
    INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5, 2008, : 1245 - 1248
  • [30] Roles of the Average Voice in Speaker-adaptive HMM-based Speech Synthesis
    Yamagishi, Junichi
    Watts, Oliver
    King, Simon
    Usabaev, Bela
    11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 1-2, 2010, : 418 - +