Unsupervised segmentation of meeting configurations and activities using speech activity detection

被引:0
|
作者
Brdiczka, Oliver [1 ]
Vaufreydaz, Dominique [1 ]
Maisonnasse, Jerome [1 ]
Reignier, Patrick [1 ]
机构
[1] INRIA Rhone Alpes, 655 Av Europe, F-38330 Montbonnot St Martin, France
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper addresses the problem of segmenting small group meetings in order to detect different group configurations and activities in an intelligent environment. Our approach takes speech activity detection of individuals attending a meeting as input. The goal is to separate distinct distributions of speech activity observation corresponding to distinct group configurations and activities. We propose an unsupervised method based on the calculation of the Jeffrey divergence between histograms of speech activity observations. These histograms are generated from adjacent windows of variable size slid from the beginning to the end of a meeting recording. The peaks of the resulting Jeffrey divergence curves are detected using successive robust mean estimation. After a merging and filtering process, the retained peaks are used to select the best model, i.e. the best speech activity distribution allocation for a given meeting recording. These distinct distributions can be interpreted as distinct segments of group configuration and activity. To evaluate, we recorded 6 small group meetings. We measured the correspondence between detected segments and labeled group configurations and activities. The obtained results are promising, in particular as our method is completely unsupervised.
引用
收藏
页码:195 / +
页数:2
相关论文
共 50 条
  • [1] Unsupervised speech/non-speech detection for automatic speech recognition in meeting rooms
    Maganti, Hari Krishna
    Motlicek, Petr
    Gatica-Perez, Daniel
    [J]. 2007 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL IV, PTS 1-3, 2007, : 1037 - +
  • [2] Using spatial cues for meeting speech segmentation
    Cheng, E
    Lukasiak, J
    Burnett, IS
    Stirling, D
    [J]. 2005 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME), VOLS 1 AND 2, 2005, : 350 - 353
  • [3] Unsupervised Speech Activity Detection Using Voicing Measures and Perceptual Spectral Flux
    Sadjadi, Seyed Omid
    Hansen, John H. L.
    [J]. IEEE SIGNAL PROCESSING LETTERS, 2013, 20 (03) : 197 - 200
  • [4] ROBUST FEATURE CLUSTERING FOR UNSUPERVISED SPEECH ACTIVITY DETECTION
    Dubey, Harishchandra
    Sangwan, Abhijeet
    Hansen, John H. L.
    [J]. 2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 2726 - 2730
  • [5] Unsupervised Segmentation of Audio Speech Using the Voting Experts Algorithm
    Miller, Matthew
    Wong, Peter
    Stoytchev, Alexander
    [J]. ARTIFICIAL GENERAL INTELLIGENCE PROCEEDINGS, 2009, 8 : 138 - 143
  • [6] Speech activity detection on multichannels of meeting recordings
    Huang, ZQ
    Harper, MP
    [J]. MACHINE LEARNING FOR MULTIMODAL INTERACTION, 2005, 3869 : 415 - 427
  • [7] Using spatial audio cues from speech excitation for meeting speech segmentation
    Cheng, Eva
    Burnett, Ian
    Ritz, Christian
    [J]. 2006 8TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING, VOLS 1-4, 2006, : 3067 - +
  • [8] Visual Speech Detection using an Unsupervised Learning Framework
    Ahmad, Rameez
    Raza, Syed Paymaan
    Malik, Hafiz
    [J]. 2013 12TH INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS (ICMLA 2013), VOL 2, 2013, : 525 - 528
  • [9] Multispeaker speech activity detection for the ICSI meeting recorder
    Pfau, T
    Ellis, DPW
    Stolcke, A
    [J]. ASRU 2001: IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING, CONFERENCE PROCEEDINGS, 2001, : 107 - 110
  • [10] Unsupervised learning of overlapped speech model parameters for multichannel speech activity detection in meetings
    Laskowski, Kornel
    Schultz, Tanja
    [J]. 2006 IEEE International Conference on Acoustics, Speech and Signal Processing, Vols 1-13, 2006, : 993 - 996