Neural Network Speaker Descriptor in Speaker Diarization of Telephone Speech

被引:0
|
作者
Zajic, Zbynek [1 ]
Zelinka, Jan [1 ,2 ]
Mueller, Ludek [1 ,2 ]
机构
[1] Univ West Bohemia, NTIS New Technol Informat Soc, Fac Appl Sci, Univ 8, Plzen 30614, Czech Republic
[2] Univ West Bohemia, Dept Cybernet, Fac Appl Sci, Univ 8, Plzen 30614, Czech Republic
来源
关键词
Neural network; Speaker diarization; i-Vector;
D O I
10.1007/978-3-319-66429-3_55
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
In this paper, we have been investigating an approach to a speaker representation for a diarization system that clusters short telephone conversation segments (produced by the same speaker). The proposed approach applies a neural-network-based descriptor that replaces a usual i-vector descriptor in the state-of-the-art diarization systems. The comparison of these two techniques was done on the English part of the CallHome corpus. The final results indicate the superiority of the i-vector's approach although our proposed descriptor brings an additive information. Thus, the combined descriptor represents a speaker in a segment for diarization purpose with lower diarization error (almost 20% relative improvement compared with only i-vector application).
引用
下载
收藏
页码:555 / 563
页数:9
相关论文
共 50 条
  • [1] CONVOLUTIONAL NEURAL NETWORK FOR SPEAKER CHANGE DETECTION IN TELEPHONE SPEAKER DIARIZATION SYSTEM
    Hruz, Marek
    Zajic, Zbynek
    2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 4945 - 4949
  • [2] ARTIFICIAL NEURAL NETWORK FEATURES FOR SPEAKER DIARIZATION
    Yella, Harsha
    Stolcke, Andreas
    Slaney, Malcolm
    2014 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY SLT 2014, 2014, : 402 - 406
  • [3] Recurrent Neural Network Based Speaker Change Detection from Text Transcription Applied in Telephone Speaker Diarization System
    Zajic, Zbynek
    Soutner, Daniel
    Hruz, Marek
    Muller, Ludek
    Radova, Vlasta
    TEXT, SPEECH, AND DIALOGUE (TSD 2018), 2018, 11107 : 342 - 350
  • [4] Speech Segmentation and Speaker Diarization using Time-Delay Neural Network
    Toruk, Mesut
    Serbes, Ahmet
    Bilgin, Gokhan
    2019 INNOVATIONS IN INTELLIGENT SYSTEMS AND APPLICATIONS CONFERENCE (ASYU), 2019, : 335 - 339
  • [5] An experimental review of speaker diarization methods with application to two-speaker conversational telephone speech recordings
    Serafini, Luca
    Cornell, Samuele
    Morrone, Giovanni
    Zovato, Enrico
    Brutti, Alessio
    Squartini, Stefano
    COMPUTER SPEECH AND LANGUAGE, 2023, 82
  • [6] Online Neural Speaker Diarization With Target Speaker Tracking
    Wang, Weiqing
    Li, Ming
    IEEE/ACM Transactions on Audio Speech and Language Processing, 2024, 32 : 5078 - 5091
  • [7] A Comparison of Neural Network Feature Transforms for Speaker Diarization
    Yella, Sree Harsha
    Stolcke, Andreas
    16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 3026 - 3030
  • [8] SPEAKER DIARIZATION USING DEEP NEURAL NETWORK EMBEDDINGS
    Garcia-Romero, Daniel
    Snyder, David
    Sell, Gregory
    Povey, Daniel
    McCree, Alan
    2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 4930 - 4934
  • [9] Neural speech turn segmentation and affinity propagation for speaker diarization
    Yin, Ruiqing
    Bredin, Herve
    Barras, Claude
    19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 1393 - 1397
  • [10] Convolutional Neural Network Architectures for Gender, Emotional Detection from Speech and Speaker Diarization
    Taha T.M.
    Messaoud Z.B.
    Frikha M.
    International Journal of Interactive Mobile Technologies, 2024, 18 (03): : 88 - 103