Incorporation of the ASR Output in Speaker Segmentation and Clustering within the Task of Speaker Diarization of Broadcast Streams

被引:0
|
作者
Silovsky, Jan [1 ]
Zdansky, Jindrich [1 ]
Nouza, Jan [1 ]
Cerva, Petr [1 ]
Prazak, Jan [1 ]
机构
[1] Tech Univ Liberec, Fac Mechatron, Inst Informat Technol & Elect, Liberec, Czech Republic
关键词
D O I
暂无
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
In this paper we study the effect of incorporation of automatic transcriptions in the speaker diarization process. We aim to improve both the diarization accuracy as evaluated by standard objective measures and quality of the diarization output from user's perspective. Although the presented approach relies on output of an automatic speech recognizer, it makes no use of lexical information. Instead, we use information about word boundaries and classification of non-speech events occurring in the processed stream. The former information is used as constraining condition for speaker change-point candidates and the latter facilitate to neglect various vocal noise sounds that carry no speaker-specific information (considering representation of the signal by cepstral features) and thus harm the speaker's representation. The experimental evaluation of the presented approach was carried out using the COST278 multilingual broadcast news database. We demonstrate that the approach yields improvement in terms of both speaker diarization and segmentation performance measures. Furthermore, we show that the number of change-points detected within words (and not at their boundaries) is significantly reduced.
引用
收藏
页码:118 / 123
页数:6
相关论文
共 50 条
  • [1] PLDA-based Clustering for Speaker Diarization of Broadcast Streams
    Silovsky, Jan
    Prazak, Jan
    Cerva, Petr
    Zdansky, Jindrich
    Nouza, Jan
    [J]. 12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 2920 - +
  • [2] Comparison of Segmentation and Clustering Methods for Speaker Diarization of Broadcast Stream Audio
    Prazak, Jan
    Silovsky, Jan
    [J]. ANALYSIS OF VERBAL AND NONVERBAL COMMUNICATION AND ENACTMENT: THE PROCESSING ISSUES, 2011, 6800 : 214 - 222
  • [3] Speaker diarization of French broadcast news
    Gupta, Vishwa
    Boulianne, Gilles
    Kenny, Patrick
    Ouellet, Pierre
    Dumouchel, Pierre
    [J]. 2008 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-12, 2008, : 4365 - 4368
  • [4] Multistage speaker diarization of broadcast news
    Barras, Claude
    Zhu, Xuan
    Meignier, Sylvain
    Gauvain, Jean-Luc
    [J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2006, 14 (05): : 1505 - 1512
  • [5] Robust Speaker Diarization for News Broadcast
    Karthik, M. L. N. S.
    Ganesh, Mirishkar Sai
    Patnaik, Bijayananda
    [J]. 2018 INTERNATIONAL CONFERENCE ON WIRELESS COMMUNICATIONS, SIGNAL PROCESSING AND NETWORKING (WISPNET), 2018,
  • [6] Bayes Factor Based Speaker Segmentation for Speaker Diarization
    Wang, D.
    Vogt, R.
    Sridharan, S.
    [J]. 11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 1-2, 2010, : 1405 - 1408
  • [7] Factor Analysis for Speaker Segmentation and Improved Speaker Diarization
    Desplanques, Brecht
    Demuynck, Kris
    Martens, Jean-Pierre
    [J]. 16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 3081 - 3085
  • [8] Prosodic and Phonetic Features for Speaker Clustering in Speaker Diarization Systems
    Zibert, Janez
    Mihelic, France
    [J]. 12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 1040 - +
  • [9] Speaker diarization: From broadcast news to lectures
    Zhu, X.
    Barras, C.
    Lamel, L.
    Gauvain, J-L.
    [J]. MACHINE LEARNING FOR MULTIMODAL INTERACTION, 2006, 4299 : 396 - +
  • [10] Domain Adaptation of PLDA models in Broadcast Diarization by means of Unsupervised Speaker Clustering
    Vinals, Ignacio
    Ortega, Alfonso
    Villalba, Jesus
    Miguel, Antonio
    Lleida, Eduardo
    [J]. 18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 2829 - 2833