Speaker change detection in casual conversations using excitation source features

被引:4
|
作者
Dhananjaya, N. [1 ]
Yegnanarayana, B. [2 ]
机构
[1] Indian Inst Technol, Madras 600036, Tamil Nadu, India
[2] Int Inst Informat Technol, Hyderabad, Andhra Pradesh, India
关键词
speaker change detection; multispeaker conversation; autoassociative neural network (AANN) models; excitation source features; linear prediction (LP) residual;
D O I
10.1016/j.specom.2007.08.003
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
In this paper we propose a method for speaker change detection using features of excitation source of the speech production mechanism. The method uses neural network models to capture the speaker-specific information from a signal that represents predominantly the excitation source. The focus in this paper is on speaker change detection in casual telephone conversations, in which short (<5 s) speaker turns are common. Excitation source features are a better choice for modeling a speaker, when limited amount of speech data is available, when compared to the vocal tract system features. Linear prediction residual is used as an estimate of the excitation source signal. Autoassociative neural network models are proposed to capture the higher order relations among the samples of the residual signal. Speaker models are generated for every one second of voiced speech from the first few seconds of the conversation. These models are used to detect the speaker change points. Performance of the proposed method for speaker change detection is evaluated on a database containing several two-speaker conversations. (C) 2007 Elsevier B.V. All rights reserved.
引用
收藏
页码:153 / 161
页数:9
相关论文
共 50 条
  • [31] Speaker Change Detection Using Variable Segments for Video Indexing
    Tam, King Yiu
    Lay, Jose
    Levy, David
    ADVANCES IN MULTIMEDIA MODELING, PT I, 2011, 6523 : 296 - 306
  • [32] Exploration of Vocal Excitation Modulation Features for Speaker Recognition
    Wang, Ning
    Ching, P. C.
    Lee, Tan
    INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 916 - 919
  • [33] SPEAKER RECOGNITION FOR MULTI-SPEAKER CONVERSATIONS USING X-VECTORS
    Snyder, David
    Garcia-Romero, Daniel
    Sell, Gregory
    McCree, Alan
    Povey, Daniel
    Khudanpur, Sanjeev
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 5796 - 5800
  • [34] Significance of excitation source sequence information for Speaker Verification
    Agarwal, Ayush
    Mishra, Jagabandhu
    Prasanna, S. R. Mahadeva
    2022 IEEE INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING AND COMMUNICATIONS, SPCOM, 2022,
  • [35] Use of vocal source features in speaker segmentation
    Chan, W. N.
    Lee, Tan
    Zheng, Nengheng
    Hua Ouyang
    2006 IEEE International Conference on Acoustics, Speech and Signal Processing, Vols 1-13, 2006, : 657 - 660
  • [36] Spoken Language Change Detection Inspired by Speaker Change Detection
    Mishra, Jagabandhu
    Prasanna, S. R. M.
    CIRCUITS SYSTEMS AND SIGNAL PROCESSING, 2024, 43 (10) : 6373 - 6398
  • [37] Estimation of age from speech using excitation source features
    Avikal, Shwetank
    Sharma, Kritika
    Barthwal, Anuragh
    Kumar, K. C. Nithin
    Badhotiya, Gaurav Kumar
    MATERIALS TODAY-PROCEEDINGS, 2021, 46 : 11046 - 11049
  • [38] Recognition of Emotions from Speech using Excitation Source Features
    Koolagudi, Shashidhar G.
    Devliyal, Swati
    Chawla, Bhavna
    Barthwal, Anurag
    Rao, K. Sreenivasa
    INTERNATIONAL CONFERENCE ON MODELLING OPTIMIZATION AND COMPUTING, 2012, 38 : 3409 - 3417
  • [39] Robust Speaker Recognition Using Denoised Vocal Source and Vocal Tract Features
    Wang, Ning
    Ching, P. C.
    Zheng, Nengheng
    Lee, Tan
    IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2011, 19 (01): : 196 - 205
  • [40] Speaker change detection and speaker clustering using VQ distortion for broadcast news speech recognition
    Mori, K
    Nakagawa, S
    2001 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I-VI, PROCEEDINGS: VOL I: SPEECH PROCESSING 1; VOL II: SPEECH PROCESSING 2 IND TECHNOL TRACK DESIGN & IMPLEMENTATION OF SIGNAL PROCESSING SYSTEMS NEURALNETWORKS FOR SIGNAL PROCESSING; VOL III: IMAGE & MULTIDIMENSIONAL SIGNAL PROCESSING MULTIMEDIA SIGNAL PROCESSING, 2001, : 413 - 416