Extending the Task of Diarization to Speaker Attribution

被引:0
|
作者
Ghaemmaghami, Houman [1 ]
Dean, David [1 ]
Vogt, Robbie [1 ]
Sridharan, Sridha [1 ]
机构
[1] Queensland Univ Technol, Speech & Audio Res Lab, Brisbane, Qld 4001, Australia
关键词
speaker attribution; diarization; clustering; cross likelihood ratio; joint factor analysis;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper we extend the concept of speaker annotation within a single-recording, or speaker diarization, to a collection wide approach we call speaker attribution. Accordingly, speaker attribution is the task of clustering expectantly homogenous intersession clusters obtained using diarization according to common cross-recording identities. The result of attribution is a collection of spoken audio across multiple recordings attributed to speaker identities. In this paper, an attribution system is proposed using mean-only MAP adaptation of a combined-gender UBM to model clusters from a perfect diarization system, as well as a JFA-based system with session variability compensation. The normalized cross-likelihood ratio is calculated for each pair of clusters to construct an attribution matrix and the complete linkage algorithm is employed to conduct clustering of the inter-session clusters. A matched cluster purity and coverage of 87.1% was obtained on the NIST 2008 SRE corpus.
引用
收藏
页码:1056 / 1059
页数:4
相关论文
共 50 条
  • [21] Speaker-Corrupted Embeddings for Online Speaker Diarization
    Ghahabi, Omid
    Fischer, Volker
    INTERSPEECH 2019, 2019, : 386 - 390
  • [22] Online Neural Speaker Diarization With Target Speaker Tracking
    Wang, Weiqing
    Li, Ming
    IEEE/ACM Transactions on Audio Speech and Language Processing, 2024, 32 : 5078 - 5091
  • [23] Speaker Diarization and Linking of Meeting Data
    Ferras, Marc
    Madikeri, Srikanth
    Bourlard, Herve
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2016, 24 (11) : 1935 - 1945
  • [24] Speaker Diarization Using Gesture and Speech
    Gebre, Binyam Gebrekidan
    Wittenburg, Peter
    Drude, Sebastian
    Huijbregts, Marijn
    Heskes, Tom
    15TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2014), VOLS 1-4, 2014, : 582 - 586
  • [25] Phone Adaptive Training for Speaker Diarization
    Bozonnet, Simon
    Vipperla, Ravichander
    Evans, Nicholas
    13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, : 494 - 497
  • [26] Group Delay Functions for Speaker Diarization
    Yadav, Mohit
    Sao, Anil Kumar
    Dileep, A. D.
    Rajan, Padmanabhan
    2016 TWENTY SECOND NATIONAL CONFERENCE ON COMMUNICATION (NCC), 2016,
  • [27] Iterative PLDA Adaptation for Speaker Diarization
    Le Lan, Gael
    Charlet, Delphine
    Larcher, Anthony
    Meignier, Sylvain
    17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 2175 - 2179
  • [28] Multistage speaker diarization of broadcast news
    Barras, Claude
    Zhu, Xuan
    Meignier, Sylvain
    Gauvain, Jean-Luc
    IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2006, 14 (05): : 1505 - 1512
  • [29] An overview of automatic speaker diarization systems
    Tranter, Sue E.
    Reynolds, Douglas A.
    IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2006, 14 (05): : 1557 - 1565
  • [30] Speaker Diarization using Embedding Vectors
    Toruk, Mesut
    Bilgin, Gokhan
    Serbes, Ahmet
    2020 28TH SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE (SIU), 2020,