Unsupervised speaker indexing using speaker model selection based on Bayesian information criterion

被引：0

作者：

Nishida, M ^{[1
]}

Kawahara, T ^{[1
]}

机构：

[1] JST, PRESTO, Sakyo Ku, Kyoto 6068501, Japan

来源：

2003 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL I, PROCEEDINGS: SPEECH PROCESSING I | 2003年

关键词：

D O I：

暂无

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

This paper addresses unsupervised speaker indexing for discussion Audio archives. In discussions, the speaker changes frequently, thus the duration of utterances is very short and its variation is large, which causes significant problems in applying conventional methods such as model adaptation and Variance-BIC (Bayesian Information Criterion) methods. We propose a flexible framework that selects an optimal speaker model (GMM or VQ) based on the BIC according to the duration of utterances. When the speech segment is short, the simple and robust VQ-based method is expected to be chosen, while GMM will be reliably trained for long segments. For a discussion archive having a total duration of 10 hours; it is demonstrated that the proposed method achieves higher indexing performance than that of conventional methods.

引用

页码：172 / 175

页数：4

共 50 条

[31] Novel Architectures for Unsupervised Information Bottleneck Based Speaker Diarization of Meetings
Dawalatabad, Nauman
Madikeri, Srikanth
Sekhar, C. Chandra
Murthy, Hema A.
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2021, 29 : 14 - 27
[32] A critique of the Bayesian information criterion for model selection
Weakliem, DL
SOCIOLOGICAL METHODS & RESEARCH, 1999, 27 (03) : 359 - 397
[33] A Bayesian Approach to Speaker Recognition Based on GMMs Using Multiple Model Structures
Hattori, Takafumi
Hashimoto, Kei
Nankaku, Yoshihiko
Tokuda, Keiichi
13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, : 1106 - 1109
[34] Automatic speaker change detection with the Bayesian Information Criterion using MPEG-7 features and a fusion scheme
Kotti, Margarita
Benetos, Emmanouil
Kotropoulos, Constantine
2006 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS, VOLS 1-11, PROCEEDINGS, 2006, : 1856 - +
[35] Spatial features selection for unsupervised speaker segmentation and clustering
Martinez-Gonzalez, Beatriz
Pardo, Jose M.
Echeverry-Correa, Julian D.
San-Segundo, Ruben
EXPERT SYSTEMS WITH APPLICATIONS, 2017, 73 : 27 - 42
[36] UNSUPERVISED SPEAKER ADAPTATION USING ATTENTION-BASED SPEAKER MEMORY FOR END-TO-END ASR
Sari, Leda
Moritz, Niko
Hori, Takaaki
Le Roux, Jonathan
2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 7384 - 7388
[37] Variational Bayesian Model Selection for GMM-Speaker Verification using Universal Background Model.
Pekhovsky, Timur
Lokhanova, Alexandra
12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 2716 - 2719
[38] A General Bayesian Model for Speaker Verification
XU Yunfei
YANG Hai
YANG Lin
ZHOU Ruohua
YAN Yonghong
Chinese Journal of Electronics, 2016, 25 (06) : 1045 - 1051
[39] A General Bayesian Model for Speaker Verification
Xu Yunfei
Yang Hai
Yang Lin
Zhou Ruohua
Yan Yonghong
CHINESE JOURNAL OF ELECTRONICS, 2016, 25 (06) : 1045 - 1051
[40] Unsupervised Speaker Adaptation of BLSTM-RNN for LVCSR Based on Speaker Code
Huang, Zhiying
Xue, Shaofei
Yan, Zhijie
Dai, Lirong
2016 10TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2016,

← 1 2 3 4 5 →