Randomization Effect on Iterative-Based Speaker Diarization System for Telephone Conversations

被引:0
|
作者
Furmanov, Tal [1 ]
Aminov, Lidiya [2 ]
Moyal, Ami [2 ]
Lapidot, Itshak [2 ]
机构
[1] Appl Mat Inc, Rehovot, Israel
[2] Afeka Tel Aviv Acad Coll Engn, ACLP Afeka Ctr Language Proc, Tel Aviv, Israel
关键词
hidden-distortion model (HDM); self-organizing maps (SOM); K-means; initialization; speaker diarization;
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
The primary objective of speaker diarization system is to designate speech segments to one of K speakers in the conversation. We use a hidden-distortion-model (HDM)-based system. HDM allows using different emission models as speaker models. We investigate the effect of randomization in two different levels. One level is stochastic training versus deterministic training and the other, random model initialization versus preserving initialization from the previous iteration. The emission models were codebooks (CBs) trained using K-means algorithm, both, batch and stochastic versions, as well as a self-organizing map (SOM) in its stochastic version. The evaluation performed on 108 telephone conversations from the LDC CallHome corpus. We will show that randomizing is always outperforming the deterministic training. Stochastic training demonstrated relative improvement of 3.5%. Random initialization achieved relative improvement of 7.28% comparing to preservation of initialization from the previous iteration.
引用
收藏
页数:5
相关论文
共 34 条
  • [1] Initialization of Iterative-Based Speaker Diarization Systems for Telephone Conversations
    Ben-Harush, Oshry
    Ben-Harush, Ortal
    Lapidot, Itshak
    Guterman, Hugo
    IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2012, 20 (02): : 414 - 425
  • [2] Mahalanobis Based Emission Model for Speaker Diarization of Telephone Conversations
    Furmanov, Tal
    Aminov, Lidiya
    Moyal, Ami
    Lapidot, Itshak
    2014 IEEE 28TH CONVENTION OF ELECTRICAL & ELECTRONICS ENGINEERS IN ISRAEL (IEEEI), 2014,
  • [3] Full-Posterior PLDA based Speaker Diarization of telephone conversations
    Chen, Yanni
    Yan, Yonghong
    Hong, Wei
    Guan, Songzan
    PROCEEDINGS FIRST INTERNATIONAL CONFERENCE ON ELECTRONICS INSTRUMENTATION & INFORMATION SYSTEMS (EIIS 2017), 2017, : 840 - 844
  • [4] VARIATIONAL BAYES BASED I-VECTOR FOR SPEAKER DIARIZATION OF TELEPHONE CONVERSATIONS
    Zheng, Rong
    Zhang, Ce
    Zhang, Shanshan
    Xu, Bo
    2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,
  • [5] Multiple feature combination to improve speaker diarization of telephone conversations
    Gupta, Vishwa
    Kenny, Patrick
    Ouellet, Pierre
    Boulianne, Gilles
    Dumouchel, Pierre
    2007 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING, VOLS 1 AND 2, 2007, : 705 - 710
  • [6] PLDA-BASED DIARIZATION OF TELEPHONE CONVERSATIONS
    Bulut, Ahmet Emin
    Demir, Hakan
    Isik, Yusuf Ziya
    Erdogan, Hakan
    2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 4809 - 4813
  • [7] A speaker count system for telephone conversations
    Ofoegbu, Uchechukwu O.
    Iyer, Ananth N.
    Yantorno, Robert E.
    Smolenski, Brett Y.
    2006 INTERNATIONAL SYMPOSIUM ON INTELLIGENT SIGNAL PROCESSING AND COMMUNICATIONS, VOLS 1 AND 2, 2006, : 307 - 310
  • [8] Combining gaussianized/non-gaussianized features to improve speaker diarization of telephone conversations
    Gupta, Vishwa
    Kenny, Patrick
    Ouellet, Pierre
    Boulianne, Gilles
    Dumouchel, Pierre
    IEEE SIGNAL PROCESSING LETTERS, 2007, 14 (12) : 1040 - 1043
  • [9] CONVOLUTIONAL NEURAL NETWORK FOR SPEAKER CHANGE DETECTION IN TELEPHONE SPEAKER DIARIZATION SYSTEM
    Hruz, Marek
    Zajic, Zbynek
    2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 4945 - 4949
  • [10] Recurrent Neural Network Based Speaker Change Detection from Text Transcription Applied in Telephone Speaker Diarization System
    Zajic, Zbynek
    Soutner, Daniel
    Hruz, Marek
    Muller, Ludek
    Radova, Vlasta
    TEXT, SPEECH, AND DIALOGUE (TSD 2018), 2018, 11107 : 342 - 350