Mahalanobis Based Emission Model for Speaker Diarization of Telephone Conversations

被引：0

作者：

Furmanov, Tal ^{[1
]}

Aminov, Lidiya ^{[2
]}

Moyal, Ami ^{[2
]}

Lapidot, Itshak ^{[2
]}

机构：

[1] Appl Mat Inc, Rehovot, Israel

[2] Afeka Tel Aviv Acad Coll Engn, ACLP Afeka Ctr Language Proc, Tel Aviv, Israel

来源：

2014 IEEE 28TH CONVENTION OF ELECTRICAL & ELECTRONICS ENGINEERS IN ISRAEL (IEEEI) | 2014年

关键词：

Hidden-distortion model (HDM); self-organizing maps (SOM); K-means; Mahalanobis distance; speaker diarization;

D O I：

暂无

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

The primary objective of any speaker diarization system is to designate speech segments to one of K speakers in the conversation. In this work we will focus on telephone conversations, where the number of speakers is given and equal 2. We use a hidden-distortion-model (HDM)-based system. HDM allows using different emission models as speaker models. The choice of adequate emission models, properly representing the data characteristics is important for the systems' performance. We investigate the effect of several codebooks (CBs) based emission models, with Euclidian and Mahalanobis distances. The Mahalanobis distance was chosen due its potential to produce a better representation of the data's spatial layout, while limitations where maid to retain the model from divergence. The influence of the different methods is evaluated using 108 telephone conversations taken from the LDC CallHome corpus. All the experiments achieved results poorer than the original SOM-based system (DER=12.70%).

引用

页数：5

共 50 条

[1] Initialization of Iterative-Based Speaker Diarization Systems for Telephone Conversations
Ben-Harush, Oshry
Ben-Harush, Ortal
Lapidot, Itshak
Guterman, Hugo
[J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2012, 20 (02): : 414 - 425
[2] Full-Posterior PLDA based Speaker Diarization of telephone conversations
Chen, Yanni
Yan, Yonghong
Hong, Wei
Guan, Songzan
[J]. PROCEEDINGS FIRST INTERNATIONAL CONFERENCE ON ELECTRONICS INSTRUMENTATION & INFORMATION SYSTEMS (EIIS 2017), 2017, : 840 - 844
[3] Randomization Effect on Iterative-Based Speaker Diarization System for Telephone Conversations
Furmanov, Tal
Aminov, Lidiya
Moyal, Ami
Lapidot, Itshak
[J]. 2014 IEEE 28TH CONVENTION OF ELECTRICAL & ELECTRONICS ENGINEERS IN ISRAEL (IEEEI), 2014,
[4] VARIATIONAL BAYES BASED I-VECTOR FOR SPEAKER DIARIZATION OF TELEPHONE CONVERSATIONS
Zheng, Rong
Zhang, Ce
Zhang, Shanshan
Xu, Bo
[J]. 2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,
[5] Multiple feature combination to improve speaker diarization of telephone conversations
Gupta, Vishwa
Kenny, Patrick
Ouellet, Pierre
Boulianne, Gilles
Dumouchel, Pierre
[J]. 2007 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING, VOLS 1 AND 2, 2007, : 705 - 710
[6] PLDA-BASED DIARIZATION OF TELEPHONE CONVERSATIONS
Bulut, Ahmet Emin
Demir, Hakan
Isik, Yusuf Ziya
Erdogan, Hakan
[J]. 2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 4809 - 4813
[7] Online Diarization of Telephone Conversations
Ben-Harush, Oshry
Lapidot, Itshak
Guterman, Hugo
[J]. ODYSSEY 2010: THE SPEAKER AND LANGUAGE RECOGNITION WORKSHOP, 2010, : 125 - 130
[8] Incremental Diarization of Telephone Conversations
Ben-Harush, Oshiy
Lapidot, Itshak
Guterman, Hugo
[J]. 11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 3 AND 4, 2010, : 2226 - +
[9] Combining gaussianized/non-gaussianized features to improve speaker diarization of telephone conversations
Gupta, Vishwa
Kenny, Patrick
Ouellet, Pierre
Boulianne, Gilles
Dumouchel, Pierre
[J]. IEEE SIGNAL PROCESSING LETTERS, 2007, 14 (12) : 1040 - 1043
[10] Neural Network Speaker Descriptor in Speaker Diarization of Telephone Speech
Zajic, Zbynek
Zelinka, Jan
Mueller, Ludek
[J]. SPEECH AND COMPUTER, SPECOM 2017, 2017, 10458 : 555 - 563

← 1 2 3 4 5 →