DISTBIC: A speaker-based segmentation for audio data indexing

被引:137
|
作者
Delacourt, P [1 ]
Wellekens, CJ [1 ]
机构
[1] Inst Eurecom, F-06904 Sophia Antipolis, France
关键词
speaker turn detection; generalized likelihood ratio; Bayesian information criterion;
D O I
10.1016/S0167-6393(00)00027-3
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
In this paper, we address the problem of speaker-based segmentation, which is the first necessary step for several indexing tasks. It aims to extract homogeneous segments containing the longest possible utterances produced by a single speaker. In our context, no assumption is made about prior knowledge of the speaker or speech signal characteristics (neither speaker model, nor speech model). However, we assume that people do not speak simultaneously and that we have no real-time constraints. We review existing techniques and propose a new segmentation method, which combines two different segmentation techniques. This method, called DISTBIC, is organized into two passes: first the most likely speaker turns are detected, and then they are validated or discarded. The advantage of our algorithm is its efficiency in detecting speaker turns even close to one another (i.e., separated by a few seconds). (C) 2000 Elsevier Science B.V. All rights reserved.
引用
收藏
页码:111 / 126
页数:16
相关论文
共 50 条
  • [21] Speaker indexing in large audio databases using anchor models
    Sturim, DE
    Reynolds, DA
    Singer, E
    Campbell, JP
    [J]. 2001 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I-VI, PROCEEDINGS: VOL I: SPEECH PROCESSING 1; VOL II: SPEECH PROCESSING 2 IND TECHNOL TRACK DESIGN & IMPLEMENTATION OF SIGNAL PROCESSING SYSTEMS NEURALNETWORKS FOR SIGNAL PROCESSING; VOL III: IMAGE & MULTIDIMENSIONAL SIGNAL PROCESSING MULTIMEDIA SIGNAL PROCESSING - VOL IV: SIGNAL PROCESSING FOR COMMUNICATIONS; VOL V: SIGNAL PROCESSING EDUCATION SENSOR ARRAY & MULTICHANNEL SIGNAL PROCESSING AUDIO & ELECTROACOUSTICS; VOL VI: SIGNAL PROCESSING THEORY & METHODS STUDENT FORUM, 2001, : 429 - 432
  • [22] ALISP-based Data Compression for Generic Audio Indexing
    Khemiri, Houssemeddine
    Petrovska-Delacretaz, Dijana
    Chollett, Gerard
    [J]. 2014 DATA COMPRESSION CONFERENCE (DCC 2014), 2014, : 273 - 282
  • [23] Transform-based indexing of audio data for multimedia databases
    Subramanya, SR
    Simha, R
    Narahari, B
    Youssef, A
    [J]. IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA COMPUTING AND SYSTEMS '97, PROCEEDINGS, 1997, : 211 - 218
  • [24] Speaker indexing in audio archives using Gaussian mixture scoring simulation
    Aronowitz, H
    Burshtein, D
    Amir, A
    [J]. MACHINE LEARNING FOR MULTIMODAL INTERACTION, 2005, 3361 : 243 - 252
  • [25] A simulated annealing approach to speaker segmentation in audio databases
    Leiva-Murillo, Jose M.
    Salcedo-Sanz, Sancho
    Gallardo-Antolin, Ascension
    Artes-Rodriguez, Antonio
    [J]. ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2008, 21 (04) : 499 - 508
  • [26] Joint Speaker Segmentation, Localization and Identification for Streaming Audio
    Schmalenstroeer, Joerg
    Haeb-Umbach, Reinhold
    [J]. INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4, 2007, : 453 - 456
  • [27] Content-based indexing and retrieval of audio data using wavelets
    Li, GH
    Khokhar, AA
    [J]. 2000 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, PROCEEDINGS VOLS I-III, 2000, : 885 - 888
  • [28] Negation in Modern Greek revisited: selecting between two speaker-based accounts
    Veloudis, Ioannis
    [J]. FOLIA LINGUISTICA, 2023, 57 (03) : 689 - 721
  • [29] DuG: Dual speaker-based acoustic gesture recognition for humanoid robot control
    Ai, Haojun
    Tang, Kaifeng
    Han, Liangliang
    Wang, Yifeng
    Zhang, Sheng
    [J]. INFORMATION SCIENCES, 2019, 504 : 84 - 94
  • [30] A generic audio classification and segmentation approach for multimedia indexing and retrieval
    Kiranyaz, S
    Qureshi, AF
    Gabbouj, M
    [J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2006, 14 (03): : 1062 - 1081