DISTBIC: A speaker-based segmentation for audio data indexing

被引:137
|
作者
Delacourt, P [1 ]
Wellekens, CJ [1 ]
机构
[1] Inst Eurecom, F-06904 Sophia Antipolis, France
关键词
speaker turn detection; generalized likelihood ratio; Bayesian information criterion;
D O I
10.1016/S0167-6393(00)00027-3
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
In this paper, we address the problem of speaker-based segmentation, which is the first necessary step for several indexing tasks. It aims to extract homogeneous segments containing the longest possible utterances produced by a single speaker. In our context, no assumption is made about prior knowledge of the speaker or speech signal characteristics (neither speaker model, nor speech model). However, we assume that people do not speak simultaneously and that we have no real-time constraints. We review existing techniques and propose a new segmentation method, which combines two different segmentation techniques. This method, called DISTBIC, is organized into two passes: first the most likely speaker turns are detected, and then they are validated or discarded. The advantage of our algorithm is its efficiency in detecting speaker turns even close to one another (i.e., separated by a few seconds). (C) 2000 Elsevier Science B.V. All rights reserved.
引用
收藏
页码:111 / 126
页数:16
相关论文
共 50 条
  • [1] Audio data indexing : use of second-order statistics for speaker-based segmentation
    Delacourt, P
    Wellekens, C
    [J]. IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA COMPUTING AND SYSTEMS, PROCEEDINGS VOL 2, 1999, : 959 - 963
  • [2] A two-level method for unsupervised speaker-based audio segmentation
    Zhang, Shilei
    Zhang, Shuwu
    Xu, Bo
    [J]. 18TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION, VOL 4, PROCEEDINGS, 2006, : 298 - +
  • [3] BINSEG: An Efficient Speaker-based Segmentation Technique
    Zdansky, Jindrich
    [J]. INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 2182 - 2185
  • [4] Automatic segmentation and clustering for speaker indexing of audio databases
    Chen, YX
    Gao, J
    Wang, Q
    [J]. PROCEEDINGS OF THE 11TH JOINT INTERNATIONAL COMPUTER CONFERENCE, 2005, : 399 - 403
  • [5] Hybrid speaker-based segmentation system using model-level clustering
    Kim, HG
    Ertelt, D
    Sikora, T
    [J]. 2005 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1-5: SPEECH PROCESSING, 2005, : 745 - 748
  • [6] A SPEAKER-BASED APPROACH TO ASPECT
    SMITH, C
    [J]. LINGUISTICS AND PHILOSOPHY, 1986, 9 (01) : 97 - 115
  • [7] Audio-guided audiovisual data segmentation, indexing, and retrieval
    Zhang, T
    Kuo, CCJ
    [J]. STORAGE AND RETRIEVAL FOR IMAGE AND VIDEO DATABASES VII, 1998, 3656 : 316 - 327
  • [8] The Impact of Audio Segmentation to Speaker Tracking in Broadcast News Data
    Zibert, Janez
    [J]. ELEKTROTEHNISKI VESTNIK-ELECTROCHEMICAL REVIEW, 2008, 75 (04): : 205 - 210
  • [9] An unsupervised scheme for speaker indexing of audio databases
    Chen, Yanxiang
    Liu, Ming
    [J]. 2009 IEEE INTERNATIONAL CONFERENCE ON INTELLIGENT COMPUTING AND INTELLIGENT SYSTEMS, PROCEEDINGS, VOL 3, 2009, : 90 - +
  • [10] Speaker dependent video indexing based on audio-visual interaction
    Tsekeridou, S
    Pitas, I
    [J]. 1998 INTERNATIONAL CONFERENCE ON IMAGE PROCESSING - PROCEEDINGS, VOL 1, 1998, : 358 - 362