DISTBIC: A speaker-based segmentation for audio data indexing

被引:137
|
作者
Delacourt, P [1 ]
Wellekens, CJ [1 ]
机构
[1] Inst Eurecom, F-06904 Sophia Antipolis, France
关键词
speaker turn detection; generalized likelihood ratio; Bayesian information criterion;
D O I
10.1016/S0167-6393(00)00027-3
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
In this paper, we address the problem of speaker-based segmentation, which is the first necessary step for several indexing tasks. It aims to extract homogeneous segments containing the longest possible utterances produced by a single speaker. In our context, no assumption is made about prior knowledge of the speaker or speech signal characteristics (neither speaker model, nor speech model). However, we assume that people do not speak simultaneously and that we have no real-time constraints. We review existing techniques and propose a new segmentation method, which combines two different segmentation techniques. This method, called DISTBIC, is organized into two passes: first the most likely speaker turns are detected, and then they are validated or discarded. The advantage of our algorithm is its efficiency in detecting speaker turns even close to one another (i.e., separated by a few seconds). (C) 2000 Elsevier Science B.V. All rights reserved.
引用
收藏
页码:111 / 126
页数:16
相关论文
共 50 条
  • [31] A comparative study of different segmentation approaches for audio track indexing
    Pandit, MP
    Kittler, J
    Li, Y
    Chilton, E
    [J]. 15TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION, VOL 2, PROCEEDINGS: PATTERN RECOGNITION AND NEURAL NETWORKS, 2000, : 467 - 470
  • [32] Speech and Singing Discrimination for Audio Data Indexing
    Tsai, Wei-Ho
    Ma, Cin-Hao
    [J]. 2014 IEEE INTERNATIONAL CONGRESS ON BIG DATA (BIGDATA CONGRESS), 2014, : 276 - 280
  • [33] An Iterative Speaker Re-Diarization Scheme for Improving Speaker-Based Entity Extraction in Multimedia Archives
    Ghaemmaghami, Houman
    Dean, David
    Sridharan, Sridha
    [J]. 15TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2014), VOLS 1-4, 2014, : 577 - 581
  • [34] Location based speaker segmentation
    Lathoud, G
    McCowan, IA
    [J]. 2003 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL I, PROCEEDINGS: SPEECH PROCESSING I, 2003, : 176 - 179
  • [35] Location based speaker segmentation
    Lathoud, G
    McCowan, IA
    [J]. 2003 INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, VOL III, PROCEEDINGS, 2003, : 621 - 624
  • [36] Comparison of Segmentation and Clustering Methods for Speaker Diarization of Broadcast Stream Audio
    Prazak, Jan
    Silovsky, Jan
    [J]. ANALYSIS OF VERBAL AND NONVERBAL COMMUNICATION AND ENACTMENT: THE PROCESSING ISSUES, 2011, 6800 : 214 - 222
  • [37] Speaker segmentation of interviews using integrated video and audio change detectors
    Lagrange, Mathieu
    Martins, Luis Gustavo
    Teixeira, Luis F.
    Tzanetakis, George
    [J]. 2007 INTERNATIONAL WORKSHOP ON CONTENT-BASED MULTIMEDIA INDEXING, PROCEEDINGS, 2007, : 219 - +
  • [38] Dynamic Displacement Sensing, System Identification, and Control of a Speaker-Based Tendon Vibrator via Accelerometers
    Celik, Ozkan
    Gilbert, Hunter B.
    O'Malley, Marcia K.
    [J]. IEEE-ASME TRANSACTIONS ON MECHATRONICS, 2013, 18 (02) : 812 - 817
  • [39] Bayes Factor Based Speaker Segmentation for Speaker Diarization
    Wang, D.
    Vogt, R.
    Sridharan, S.
    [J]. 11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 1-2, 2010, : 1405 - 1408
  • [40] Bayes Factor Based Speaker Segmentation for Speaker Diarization
    Speech and Audio Research Laboratory, Queensland University of Technology, Brisbane, Australia
    [J]. Proc. Annu. Conf. Int. Speech. Commun. Assoc., INTERSPEECH, (1405-1408):