Purity algorithms for speaker diarization of meetings data

被引:0
|
作者
Anguera, Xavier [1 ]
Wooters, Chuck [1 ]
Hernando, Javier [1 ]
机构
[1] ICSI, Berkeley, CA 94704 USA
关键词
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
When performing speaker diarization, it is common to use an agglomerative clustering approach where the acoustic data is first split in small pieces and then pairs are merged until reaching a stopping point. When using a purely agglomerative clustering technique, one cluster cannot be split into two. Therefore, errors caused by multiple speakers being assigned to one cluster can be common. Furthermore, clusters often contain non-speech frames, creating problems when deciding which two clusters to merge and when to stop the clustering. In this paper, we present two algorithms that aim to purify the clusters. The first assigns conflicting speech segments to a new cluster, and the second detects and eliminates non-speech frames when comparing two clusters. We show improvements of over 18% relative using three datasets from the most current Rich Transcription (RT) evaluations.
引用
收藏
页码:1025 / 1028
页数:4
相关论文
共 50 条
  • [41] SPEAKER DIARIZATION WITH LSTM
    Wang, Quan
    Downey, Carlton
    Wan, Li
    Mansfield, Philip Andrew
    Moreno, Ignacio Lopez
    [J]. 2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5239 - 5243
  • [42] Multimodal Speaker Diarization
    Noulas, Athanasios
    Englebienne, Gwenn
    Krose, Ben J. A.
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2012, 34 (01) : 79 - 93
  • [43] Trainable Speaker Diarization
    Aronowitz, Hagai
    [J]. INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4, 2007, : 2021 - 2024
  • [44] INFORMATION BOTTLENECK BASED SPEAKER DIARIZATION OF MEETINGS USING NON-SPEECH AS SIDE INFORMATION
    Yella, Sree Harsha
    Bourlard, Herve
    [J]. 2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,
  • [45] DiarTk : An Open Source Toolkit for Research in Multistream Speaker Diarization and its Application to Meetings Recordings
    Vijayasenan, Deepu
    Valente, Fabio
    [J]. 13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, : 2167 - 2170
  • [46] MODELING AUDIO DIRECTIONAL STATISTICS USING A PROBABILISTIC SPATIAL DICTIONARY FOR SPEAKER DIARIZATION IN REAL MEETINGS
    Fakhry, Mahmoud
    Ito, Nobutaka
    Araki, Shoko
    Nakatani, Tomohiro
    [J]. 2016 IEEE INTERNATIONAL WORKSHOP ON ACOUSTIC SIGNAL ENHANCEMENT (IWAENC), 2016,
  • [47] Speaker diarization for multi-microphone meetings using only between-channel differences
    Pardo, Jose M.
    Anguera, Xavier
    Wooters, Chuck
    [J]. MACHINE LEARNING FOR MULTIMODAL INTERACTION, 2006, 4299 : 257 - +
  • [48] Factor analysis-based approaches applied to the speaker diarization task of meetings: a preliminary study
    Tomasek, Pavel
    Fredouille, Corinne
    Matrouf, Driss
    [J]. ODYSSEY 2010: THE SPEAKER AND LANGUAGE RECOGNITION WORKSHOP, 2010, : 131 - 137
  • [49] SPEAKER DIARIZATION USING DATA-DRIVEN AUDIO SEQUENCING
    Khemiri, Houssemeddine
    Petrovska-Delacretaz, Dijana
    Chollet, Gerard
    [J]. 2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 7736 - 7740
  • [50] TSUP Speaker Diarization System for Conversational Short-phrase Speaker Diarization Challenge
    Pang, Bowen
    Zhao, Huan
    Zhang, Gaosheng
    Yang, Xiaoyue
    Sun, Yang
    Zhang, Li
    Wang, Qing
    Xie, Lei
    [J]. 2022 13TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2022, : 502 - 506