LEVERAGING LSTM MODELS FOR OVERLAP DETECTION IN MULTI-PARTY MEETINGS

被引:0
|
作者
Sajjan, Neeraj [1 ]
Ganesh, Shobhana [1 ]
Sharma, Neeraj [1 ]
Ganapathy, Sriram [1 ]
Ryant, Neville [2 ]
机构
[1] Indian Inst Sci, Learning & Extract Acoust Patterns LEAP Lab, Bangalore 560012, Karnataka, India
[2] Univ Penn, Linguist Data Consortium, Philadelphia, PA 19104 USA
关键词
Overlap Detection; LSTM modeling; Speaker Diarization; Conversational Speech Analysis; SPEAKER DIARIZATION; SPEECH DETECTION; USABLE SPEECH;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
The detection of overlapping speech segments is of key importance in speech applications involving analysis of multi-party conversations. The detection problem is challenging because overlapping speech segments are typically captured as short speech utterances far-field microphone recordings. In this paper, we propose detection of overlap segments using a neural network architecture consisting of long-short term memory (LSTM) models. The neural network architecture learns the presence of overlap in speech by identifying the spectrotemporal structure of overlapping speech segments. In order to evaluate the model performance, we perform experiments on simulated overlapped speech generated from the TIMIT database, and natural multi-talker conversational speech in the augmented Multiparty Interaction (AMI) meeting corpus. The proposed approach yields improvements over a Gaussian mixture model based overlap detection system. Furthermore, as an application of overlap detection, integration of overlap detection into speaker diarization task is shown to give improvement in diarization error rate.
引用
收藏
页码:5249 / 5253
页数:5
相关论文
共 50 条
  • [1] On the Dynamics of Overlap in Multi-Party Conversation
    Laskowski, Kornel
    Heldner, Mattias
    Edlund, Jens
    [J]. 13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, : 846 - 849
  • [2] Extracting question/answer pairs in multi-party meetings
    Kathol, Andreas
    Tur, Gokhan
    [J]. 2008 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-12, 2008, : 5053 - 5056
  • [3] Using prosody for automatic sentence segmentation of multi-party meetings
    Kolar, Jachym
    Shriberg, Elizabeth
    Liu, Yang
    [J]. TEXT, SPEECH AND DIALOGUE, PROCEEDINGS, 2006, 4188 : 629 - 636
  • [4] Extractive summarization of multi-party meetings through discourse segmentation
    Bokaei, Mohammad Hadi
    Sameti, Hossein
    Liu, Yang
    [J]. NATURAL LANGUAGE ENGINEERING, 2016, 22 (01) : 41 - 72
  • [5] Speaker diarization for multi-party meetings using acoustic fusion
    Anguera, X
    Wooters, C
    Hernando, J
    [J]. 2005 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU), 2005, : 426 - 431
  • [6] On Speaker-Specific Prosodic Models for Automatic Dialog Act Segmentation of Multi-Party Meetings
    Kolar, Jachym
    Shriberg, Elizabeth
    Liu, Yang
    [J]. INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 2014 - 2017
  • [7] Estimating Dominance in Multi-Party Meetings Using Speaker Diarization
    Hung, Hayley
    Huang, Yan
    Friedland, Gerald
    Gatica-Perez, Daniel
    [J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2011, 19 (04): : 847 - 860
  • [8] MODELING VOCAL INTERACTION FOR TEXT-INDEPENDENT DETECTION OF INVOLVEMENT HOTSPOTS IN MULTI-PARTY MEETINGS
    Laskowski, Kornel
    [J]. 2008 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY: SLT 2008, PROCEEDINGS, 2008, : 81 - 84
  • [9] PREDICTING NEXT SPEAKER BASED ON HEAD MOVEMENT IN MULTI-PARTY MEETINGS
    Ishii, Ryo
    Kumano, Shiro
    Otsuka, Kazuhiro
    [J]. 2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 2319 - 2323
  • [10] Detecting action items in multi-party meetings: Annotation and initial experiments
    Purver, Matthew
    Ehlen, Patrick
    Niekrasz, John
    [J]. MACHINE LEARNING FOR MULTIMODAL INTERACTION, 2006, 4299 : 200 - +