Neural Network Speaker Descriptor in Speaker Diarization of Telephone Speech

被引:0
|
作者
Zajic, Zbynek [1 ]
Zelinka, Jan [1 ,2 ]
Mueller, Ludek [1 ,2 ]
机构
[1] Univ West Bohemia, NTIS New Technol Informat Soc, Fac Appl Sci, Univ 8, Plzen 30614, Czech Republic
[2] Univ West Bohemia, Dept Cybernet, Fac Appl Sci, Univ 8, Plzen 30614, Czech Republic
来源
关键词
Neural network; Speaker diarization; i-Vector;
D O I
10.1007/978-3-319-66429-3_55
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
In this paper, we have been investigating an approach to a speaker representation for a diarization system that clusters short telephone conversation segments (produced by the same speaker). The proposed approach applies a neural-network-based descriptor that replaces a usual i-vector descriptor in the state-of-the-art diarization systems. The comparison of these two techniques was done on the English part of the CallHome corpus. The final results indicate the superiority of the i-vector's approach although our proposed descriptor brings an additive information. Thus, the combined descriptor represents a speaker in a segment for diarization purpose with lower diarization error (almost 20% relative improvement compared with only i-vector application).
引用
下载
收藏
页码:555 / 563
页数:9
相关论文
共 50 条
  • [31] SpeakerBeam: Speaker Aware Neural Network for Target Speaker Extraction in Speech Mixtures
    Zmolikova, Katerina
    Delcroix, Marc
    Kinoshita, Keisuke
    Ochiai, Tsubasa
    Nakatani, Tomohiro
    Burget, Lukas
    Cernocky, Jan
    IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2019, 13 (04) : 800 - 814
  • [32] Speaker Diarization Using Convolutional Neural Network for Statistics Accumulation Refinement
    Zajic, Zbynek
    Hruz, Marek
    Mueller, Ladek
    18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 3562 - 3566
  • [33] Neural Speaker Extraction with Speaker-Speech Cross-Attention Network
    Wang, Wupeng
    Xu, Chenglin
    Ge, Meng
    Li, Haizhou
    INTERSPEECH 2021, 2021, : 3535 - 3539
  • [34] Speaker Diarization Using Deep Recurrent Convolutional Neural Networks for Speaker Embeddings
    Cyrta, Pawel
    Trzcinski, Tomasz
    Stokowiec, Wojciech
    INFORMATION SYSTEMS ARCHITECTURE AND TECHNOLOGY, PT I, 2018, 655 : 107 - 117
  • [35] Speaker diarization using autoassociative neural networks
    Jothilakshmi, S.
    Ramalingam, V.
    Palanivel, S.
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2009, 22 (4-5) : 667 - 675
  • [36] Speaker Diarization through Waveform and Neural Net
    Latypov, Rustam
    Stolov, Evgeni
    PROCEEDINGS OF THE 2021 29TH CONFERENCE OF OPEN INNOVATIONS ASSOCIATION (FRUCT), VOL 1, 2021, : 234 - 239
  • [37] Mahalanobis Based Emission Model for Speaker Diarization of Telephone Conversations
    Furmanov, Tal
    Aminov, Lidiya
    Moyal, Ami
    Lapidot, Itshak
    2014 IEEE 28TH CONVENTION OF ELECTRICAL & ELECTRONICS ENGINEERS IN ISRAEL (IEEEI), 2014,
  • [38] Multiple feature combination to improve speaker diarization of telephone conversations
    Gupta, Vishwa
    Kenny, Patrick
    Ouellet, Pierre
    Boulianne, Gilles
    Dumouchel, Pierre
    2007 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING, VOLS 1 AND 2, 2007, : 705 - 710
  • [39] TOWARDS END-TO-END SPEAKER DIARIZATION WITH GENERALIZED NEURAL SPEAKER CLUSTERING
    Zhang, Chunlei
    Shi, Jiatong
    Weng, Chao
    Yu, Meng
    Yu, Dong
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 8372 - 8376
  • [40] Speaker normalization on conversational telephone speech
    Wegmann, S
    McAllaster, D
    Orloff, J
    Peskin, B
    1996 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, CONFERENCE PROCEEDINGS, VOLS 1-6, 1996, : 339 - 341