Speaker-adaptive speech recognition using speaker diarization for improved transcription of large spoken archives

被引:14
|
作者
Cerva, Petr [1 ]
Silovsky, Jan [1 ]
Zdansky, Jindrich [1 ]
Nouza, Jan [1 ]
Seps, Ladislav [1 ]
机构
[1] Tech Univ Liberec, Inst Informat Technol & Elect, Liberec 46117, Czech Republic
关键词
Speaker adaptive; Automatic speech recognition; Speaker adaptation; Speaker diarization; Automatic transcription; Large spoken archives; ADAPTATION; ACCESS;
D O I
10.1016/j.specom.2013.06.017
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
This paper deals with speaker-adaptive speech recognition for large spoken archives. The goal is to improve the recognition accuracy of an automatic speech recognition (ASR) system that is being deployed for transcription of a large archive of Czech radio. This archive represents a significant part of Czech cultural heritage, as it contains recordings covering 90 years of broadcasting. A large portion of these documents (100,000 h) is to be transcribed and made public for browsing. To improve the transcription results, an efficient speaker-adaptive scheme is proposed. The scheme is based on integration of speaker diarization and adaptation methods and is designed to achieve a low Real-Time Factor (RTF) of the entire adaptation process, because the archive's size is enormous. It thus employs just two decoding passes, where the first one is carried out using the lexicon with a reduced number of items. Moreover, the transcripts from the first pass serve not only for adaptation, but also as the input to the speaker diarization module, which employs two-stage clustering. The output of diarization is then utilized for a cluster-based unsupervised Speaker Adaptation (SA) approach that also utilizes information based on the gender of each individual speaker. Presented experimental results on various types of programs show that our adaptation scheme yields a significant Word Error Rate (WER) reduction from 22.24% to 18.85% over the Speaker Independent (SI) system while operating at a reasonable RTF. (c) 2013 Elsevier B.V. All rights reserved.
引用
下载
收藏
页码:1033 / 1046
页数:14
相关论文
共 50 条
  • [1] Study on Integration of Speaker Diarization with Speaker Adaptive Speech Recognition for Broadcast Transcription
    Silovsky, Jan
    Cerva, Petr
    Zdansky, Jindrich
    Nouza, Jan
    13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, : 478 - 481
  • [2] On Speaker-Independent, Speaker-Dependent, and Speaker-Adaptive Speech Recognition
    Huang, Xuedong
    Lee, Kai-Fu
    IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 1993, 1 (02): : 150 - 157
  • [3] Speaker-Adaptive Speech Recognition Based on Surface Electromyography
    Wand, Michael
    Schultz, Tanja
    BIOMEDICAL ENGINEERING SYSTEMS AND TECHNOLOGIES, 2010, 52 : 271 - 285
  • [4] Integrated speaker-adaptive speech synthesis
    Wan, Moquan
    Degottex, Gilles
    Gales, Mark J. F.
    2017 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), 2017, : 705 - 711
  • [5] TOWARDS SPEAKER-ADAPTIVE SPEECH RECOGNITION BASED ON SURFACE ELECTROMYOGRAPHY
    Wand, Michael
    Schultz, Tanja
    BIOSIGNALS 2009: PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON BIO-INSPIRED SYSTEMS AND SIGNAL PROCESSING, 2009, : 155 - 162
  • [6] Dysarthric Speech Recognition Using Dysarthria-Severity-Dependent and Speaker-Adaptive Models
    Kim, Myung Jong
    Yoo, Joohong
    Kim, Hoirin
    14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 3589 - 3593
  • [7] Using Unsupervised Feature-Based Speaker Adaptation for Improved Transcription of Spoken Archives
    Cerva, Petr
    Palecek, Karel
    Silovsky, Jan
    Nouza, Jan
    12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 2576 - 2579
  • [8] Speaker Diarization Using Gesture and Speech
    Gebre, Binyam Gebrekidan
    Wittenburg, Peter
    Drude, Sebastian
    Huijbregts, Marijn
    Heskes, Tom
    15TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2014), VOLS 1-4, 2014, : 582 - 586
  • [9] Improved Overlapped Speech Handling for Speaker Diarization
    Boakye, Kofi
    Vinyals, Oriol
    Friedland, Gerald
    12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 948 - +
  • [10] Comparison of Gender- and Speaker-adaptive Emotion Recognition
    Sidorov, Maxim
    Ultes, Stefan
    Schmitt, Alexander
    LREC 2014 - NINTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2014, : 3476 - 3480