Neural Network Speaker Descriptor in Speaker Diarization of Telephone Speech

被引:0
|
作者
Zajic, Zbynek [1 ]
Zelinka, Jan [1 ,2 ]
Mueller, Ludek [1 ,2 ]
机构
[1] Univ West Bohemia, NTIS New Technol Informat Soc, Fac Appl Sci, Univ 8, Plzen 30614, Czech Republic
[2] Univ West Bohemia, Dept Cybernet, Fac Appl Sci, Univ 8, Plzen 30614, Czech Republic
来源
关键词
Neural network; Speaker diarization; i-Vector;
D O I
10.1007/978-3-319-66429-3_55
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
In this paper, we have been investigating an approach to a speaker representation for a diarization system that clusters short telephone conversation segments (produced by the same speaker). The proposed approach applies a neural-network-based descriptor that replaces a usual i-vector descriptor in the state-of-the-art diarization systems. The comparison of these two techniques was done on the English part of the CallHome corpus. The final results indicate the superiority of the i-vector's approach although our proposed descriptor brings an additive information. Thus, the combined descriptor represents a speaker in a segment for diarization purpose with lower diarization error (almost 20% relative improvement compared with only i-vector application).
引用
下载
收藏
页码:555 / 563
页数:9
相关论文
共 50 条
  • [41] The Detection of Overlapping Speech with Prosodic Features for Speaker Diarization
    Zelenak, Martin
    Hernando, Javier
    12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 1048 - 1051
  • [42] Simultaneous Speech Detection With Spatial Features for Speaker Diarization
    Zelenak, Martin
    Segura, Carlos
    Luque, Jordi
    Hernando, Javier
    IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2012, 20 (02): : 436 - 446
  • [43] Speaker Diarization with Enhancing Speech for the First DIHARD Challenge
    Sun, Lei
    Du, Jun
    Jiang, Chao
    Zhang, Xueyang
    He, Shan
    Yin, Bing
    Lee, Chin-Hui
    19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 2793 - 2797
  • [44] INVESTIGATION ON NEURAL BANDWIDTH EXTENSION OF TELEPHONE SPEECH FOR IMPROVED SPEAKER RECOGNITION
    Nidadavolu, Phani Sankar
    Iglesias, Vicente
    Villalba, Jesus
    Dehak, Najim
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 6111 - 6115
  • [45] Speaker-aware neural network based beamformer for speaker extraction in speech mixtures
    Zmplikova, Katerina
    Delcroix, Marc
    Kinoshita, Keisuke
    Higuchi, Takuya
    Ogawa, Atsunori
    Nakatani, Tomohiro
    18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 2655 - 2659
  • [46] Multimodal Speaker Diarization
    Noulas, Athanasios
    Englebienne, Gwenn
    Krose, Ben J. A.
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2012, 34 (01) : 79 - 93
  • [47] SPEAKER DIARIZATION WITH LSTM
    Wang, Quan
    Downey, Carlton
    Wan, Li
    Mansfield, Philip Andrew
    Moreno, Ignacio Lopez
    2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5239 - 5243
  • [48] Speaker Adversarial Neural Network (SANN) for Speaker-independent Speech Emotion Recognition
    Fahad, Md Shah
    Ranjan, Ashish
    Deepak, Akshay
    Pradhan, Gayadhar
    CIRCUITS SYSTEMS AND SIGNAL PROCESSING, 2022, 41 (11) : 6113 - 6135
  • [49] Speaker Adversarial Neural Network (SANN) for Speaker-independent Speech Emotion Recognition
    Md Shah Fahad
    Ashish Ranjan
    Akshay Deepak
    Gayadhar Pradhan
    Circuits, Systems, and Signal Processing, 2022, 41 : 6113 - 6135
  • [50] Speech Activity Detection Under Adverse Conditions Using Neural Networks and Speaker Diarization
    Ulgen, Ismail Rasim
    Saraclar, Murat
    2020 28TH SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE (SIU), 2020,