Neural Network Speaker Descriptor in Speaker Diarization of Telephone Speech

被引：0

作者：

Zajic, Zbynek ^{[1
]}

Zelinka, Jan ^{[1
,2
]}

Mueller, Ludek ^{[1
,2
]}

机构：

[1] Univ West Bohemia, NTIS New Technol Informat Soc, Fac Appl Sci, Univ 8, Plzen 30614, Czech Republic

[2] Univ West Bohemia, Dept Cybernet, Fac Appl Sci, Univ 8, Plzen 30614, Czech Republic

来源：

SPEECH AND COMPUTER, SPECOM 2017 | 2017年 / 10458卷

关键词：

Neural network; Speaker diarization; i-Vector;

D O I：

10.1007/978-3-319-66429-3_55

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

In this paper, we have been investigating an approach to a speaker representation for a diarization system that clusters short telephone conversation segments (produced by the same speaker). The proposed approach applies a neural-network-based descriptor that replaces a usual i-vector descriptor in the state-of-the-art diarization systems. The comparison of these two techniques was done on the English part of the CallHome corpus. The final results indicate the superiority of the i-vector's approach although our proposed descriptor brings an additive information. Thus, the combined descriptor represents a speaker in a segment for diarization purpose with lower diarization error (almost 20% relative improvement compared with only i-vector application).

引用

下载

页码：555 / 563

页数：9

共 50 条

[41] The Detection of Overlapping Speech with Prosodic Features for Speaker Diarization
Zelenak, Martin
Hernando, Javier
12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 1048 - 1051
[42] Simultaneous Speech Detection With Spatial Features for Speaker Diarization
Zelenak, Martin
Segura, Carlos
Luque, Jordi
Hernando, Javier
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2012, 20 (02): : 436 - 446
[43] Speaker Diarization with Enhancing Speech for the First DIHARD Challenge
Sun, Lei
Du, Jun
Jiang, Chao
Zhang, Xueyang
He, Shan
Yin, Bing
Lee, Chin-Hui
19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 2793 - 2797
[44] INVESTIGATION ON NEURAL BANDWIDTH EXTENSION OF TELEPHONE SPEECH FOR IMPROVED SPEAKER RECOGNITION
Nidadavolu, Phani Sankar
Iglesias, Vicente
Villalba, Jesus
Dehak, Najim
2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 6111 - 6115
[45] Speaker-aware neural network based beamformer for speaker extraction in speech mixtures
Zmplikova, Katerina
Delcroix, Marc
Kinoshita, Keisuke
Higuchi, Takuya
Ogawa, Atsunori
Nakatani, Tomohiro
18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 2655 - 2659
[46] Multimodal Speaker Diarization
Noulas, Athanasios
Englebienne, Gwenn
Krose, Ben J. A.
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2012, 34 (01) : 79 - 93
[47] SPEAKER DIARIZATION WITH LSTM
Wang, Quan
Downey, Carlton
Wan, Li
Mansfield, Philip Andrew
Moreno, Ignacio Lopez
2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5239 - 5243
[48] Speaker Adversarial Neural Network (SANN) for Speaker-independent Speech Emotion Recognition
Fahad, Md Shah
Ranjan, Ashish
Deepak, Akshay
Pradhan, Gayadhar
CIRCUITS SYSTEMS AND SIGNAL PROCESSING, 2022, 41 (11) : 6113 - 6135
[49] Speaker Adversarial Neural Network (SANN) for Speaker-independent Speech Emotion Recognition
Md Shah Fahad
Ashish Ranjan
Akshay Deepak
Gayadhar Pradhan
Circuits, Systems, and Signal Processing, 2022, 41 : 6113 - 6135
[50] Speech Activity Detection Under Adverse Conditions Using Neural Networks and Speaker Diarization
Ulgen, Ismail Rasim
Saraclar, Murat
2020 28TH SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE (SIU), 2020,

← 1 2 3 4 5 →