Speaker change detection in casual conversations using excitation source features

被引:4
|
作者
Dhananjaya, N. [1 ]
Yegnanarayana, B. [2 ]
机构
[1] Indian Inst Technol, Madras 600036, Tamil Nadu, India
[2] Int Inst Informat Technol, Hyderabad, Andhra Pradesh, India
关键词
speaker change detection; multispeaker conversation; autoassociative neural network (AANN) models; excitation source features; linear prediction (LP) residual;
D O I
10.1016/j.specom.2007.08.003
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
In this paper we propose a method for speaker change detection using features of excitation source of the speech production mechanism. The method uses neural network models to capture the speaker-specific information from a signal that represents predominantly the excitation source. The focus in this paper is on speaker change detection in casual telephone conversations, in which short (<5 s) speaker turns are common. Excitation source features are a better choice for modeling a speaker, when limited amount of speech data is available, when compared to the vocal tract system features. Linear prediction residual is used as an estimate of the excitation source signal. Autoassociative neural network models are proposed to capture the higher order relations among the samples of the residual signal. Speaker models are generated for every one second of voiced speech from the first few seconds of the conversation. These models are used to detect the speaker change points. Performance of the proposed method for speaker change detection is evaluated on a database containing several two-speaker conversations. (C) 2007 Elsevier B.V. All rights reserved.
引用
收藏
页码:153 / 161
页数:9
相关论文
共 50 条
  • [41] Enhanced speaker diarization with detection of backchannels using eye-gaze information in poster conversations
    Inoue, Koji
    Wakabayashi, Yukoh
    Yoshimoto, Hiromasa
    Takanashi, Katsuya
    Kawahara, Tatsuya
    16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 3086 - 3090
  • [42] Auxiliary Networks for Joint Speaker Adaptation and Speaker Change Detection
    Sari, Leda
    Hasegawa-Johnson, Mark
    Thomas, Samuel
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2021, 29 : 324 - 333
  • [43] Proposal of speaker change detection system considering speaker overlap
    Park, Jisu
    Yun, Young-Sun
    Cha, Shin
    Park, Jeon Gue
    JOURNAL OF THE ACOUSTICAL SOCIETY OF KOREA, 2021, 40 (05): : 466 - 472
  • [44] Robust Speaker Change Detection Using Kernel-Gaussian Model
    Gao, Jie
    Zhang, Xiang
    Zhao, Qingwei
    Yan, Yonghong
    INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5, 2008, : 2494 - 2497
  • [45] A novel speaker change detection algorithm
    Yu, Xiaoqing
    Tan, Haiying
    Wan, Wanggen
    2007 INTERNATIONAL CONFERENCE ON COMMUNICATIONS, CIRCUITS AND SYSTEMS PROCEEDINGS, VOLS 1 AND 2: VOL 1: COMMUNICATION THEORY AND SYSTEMS; VOL 2: SIGNAL PROCESSING, COMPUTATIONAL INTELLIGENCE, CIRCUITS AND SYSTEMS, 2007, : 607 - +
  • [46] An Approach to Online Speaker Change Point Detection Using DNNs and WFSTs
    Mateju, Lukas
    Cerva, Petr
    Zdansky, Jindrich
    INTERSPEECH 2019, 2019, : 649 - 653
  • [47] Unsupervised speaker change detection using SVM training misclassification rate
    Lin, Po-Chuan
    Wang, Jia-Ching
    Wang, Jhing-Fa
    Sung, Hao-Ching
    IEEE TRANSACTIONS ON COMPUTERS, 2007, 56 (09) : 1234 - 1244
  • [48] Efficient speaker change detection using adapted Gaussian mixture models
    Malegaonkar, Amit S.
    Ariyaeeinia, Aladdin M.
    Sivakumaran, Perasiriyan
    IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2007, 15 (06): : 1859 - 1869
  • [49] Combining vocal source and MFCC features for enhanced speaker recognition performance using GMMs
    Hosseinzadeh, Danoush
    Krishnan, Sridhar
    2007 IEEE NINTH WORKSHOP ON MULTIMEDIA SIGNAL PROCESSING, 2007, : 365 - 368
  • [50] Detection of instants of glottal closure using characteristics of excitation source
    Guruprasad, S.
    Yegnanarayana, B.
    Murty, K. Sri Rama
    INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4, 2007, : 2572 - +