Speaker change detection in casual conversations using excitation source features

被引：4

作者：

Dhananjaya, N. ^{[1
]}

Yegnanarayana, B. ^{[2
]}

机构：

[1] Indian Inst Technol, Madras 600036, Tamil Nadu, India

[2] Int Inst Informat Technol, Hyderabad, Andhra Pradesh, India

来源：

SPEECH COMMUNICATION | 2008年 / 50卷 / 02期

关键词：

speaker change detection; multispeaker conversation; autoassociative neural network (AANN) models; excitation source features; linear prediction (LP) residual;

D O I：

10.1016/j.specom.2007.08.003

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

In this paper we propose a method for speaker change detection using features of excitation source of the speech production mechanism. The method uses neural network models to capture the speaker-specific information from a signal that represents predominantly the excitation source. The focus in this paper is on speaker change detection in casual telephone conversations, in which short (<5 s) speaker turns are common. Excitation source features are a better choice for modeling a speaker, when limited amount of speech data is available, when compared to the vocal tract system features. Linear prediction residual is used as an estimate of the excitation source signal. Autoassociative neural network models are proposed to capture the higher order relations among the samples of the residual signal. Speaker models are generated for every one second of voiced speech from the first few seconds of the conversation. These models are used to detect the speaker change points. Performance of the proposed method for speaker change detection is evaluated on a database containing several two-speaker conversations. (C) 2007 Elsevier B.V. All rights reserved.

引用

页码：153 / 161

页数：9

共 50 条

[41] Enhanced speaker diarization with detection of backchannels using eye-gaze information in poster conversations
Inoue, Koji
Wakabayashi, Yukoh
Yoshimoto, Hiromasa
Takanashi, Katsuya
Kawahara, Tatsuya
16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 3086 - 3090
[42] Auxiliary Networks for Joint Speaker Adaptation and Speaker Change Detection
Sari, Leda
Hasegawa-Johnson, Mark
Thomas, Samuel
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2021, 29 : 324 - 333
[43] Proposal of speaker change detection system considering speaker overlap
Park, Jisu
Yun, Young-Sun
Cha, Shin
Park, Jeon Gue
JOURNAL OF THE ACOUSTICAL SOCIETY OF KOREA, 2021, 40 (05): : 466 - 472
[44] Robust Speaker Change Detection Using Kernel-Gaussian Model
Gao, Jie
Zhang, Xiang
Zhao, Qingwei
Yan, Yonghong
INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5, 2008, : 2494 - 2497
[45] A novel speaker change detection algorithm
Yu, Xiaoqing
Tan, Haiying
Wan, Wanggen
2007 INTERNATIONAL CONFERENCE ON COMMUNICATIONS, CIRCUITS AND SYSTEMS PROCEEDINGS, VOLS 1 AND 2: VOL 1: COMMUNICATION THEORY AND SYSTEMS; VOL 2: SIGNAL PROCESSING, COMPUTATIONAL INTELLIGENCE, CIRCUITS AND SYSTEMS, 2007, : 607 - +
[46] An Approach to Online Speaker Change Point Detection Using DNNs and WFSTs
Mateju, Lukas
Cerva, Petr
Zdansky, Jindrich
INTERSPEECH 2019, 2019, : 649 - 653
[47] Unsupervised speaker change detection using SVM training misclassification rate
Lin, Po-Chuan
Wang, Jia-Ching
Wang, Jhing-Fa
Sung, Hao-Ching
IEEE TRANSACTIONS ON COMPUTERS, 2007, 56 (09) : 1234 - 1244
[48] Efficient speaker change detection using adapted Gaussian mixture models
Malegaonkar, Amit S.
Ariyaeeinia, Aladdin M.
Sivakumaran, Perasiriyan
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2007, 15 (06): : 1859 - 1869
[49] Combining vocal source and MFCC features for enhanced speaker recognition performance using GMMs
Hosseinzadeh, Danoush
Krishnan, Sridhar
2007 IEEE NINTH WORKSHOP ON MULTIMEDIA SIGNAL PROCESSING, 2007, : 365 - 368
[50] Detection of instants of glottal closure using characteristics of excitation source
Guruprasad, S.
Yegnanarayana, B.
Murty, K. Sri Rama
INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4, 2007, : 2572 - +

← 1 2 3 4 5 →