Speaker diarization for multi-microphone meetings using only between-channel differences

被引:0
|
作者
Pardo, Jose M. [1 ,2 ]
Anguera, Xavier [1 ,3 ]
Wooters, Chuck [1 ]
机构
[1] Int Comp Sci Inst, Berkeley, CA 94708 USA
[2] Univ Politecn Madrid, E-28040 Madrid, Spain
[3] Tech Univ Catalonia, Barcelona, Spain
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We present a method to extract speaker turn segmentation from multiple distant microphones (MDM) using only delay values found via a cross-correlation between the available channels. The method is robust against the number of speakers (which is unknown to the system), the number of channels, and the acoustics of the room. The delays between channels are processed and clustered to obtain a segmentation hypothesis. We have obtained a 31.2% diarization error rate (DER) for the NIST's RT05s MDM conference room evaluation set. For a MDM subset of NIST's RT04s development set, we have obtained 36.93% DER and 35.73% DER*. Comparing those results with the ones presented by Ellis and Liu [8], who also used between-channels differences for the same data, we have obtained 43% relative improvement in the error rate.
引用
收藏
页码:257 / +
页数:3
相关论文
共 50 条
  • [21] NIST RT'05S evaluation: Pre-processing techniques and speaker diarization on multiple microphone meetings
    Istrate, D
    Fredouille, C
    Meignier, S
    Besacier, L
    Bonastre, JR
    [J]. MACHINE LEARNING FOR MULTIMODAL INTERACTION, 2005, 3869 : 428 - 439
  • [22] MULTI-MICROPHONE ACOUSTIC ECHO CANCELLATION USING MULTI-CHANNEL WARPED LINEAR PREDICTION OF COMMON ACOUSTICAL POLES
    Gil-Cacho, Pepe
    van Waterschoot, Toon
    Moonen, Marc
    Jensen, Soren Holdt
    [J]. 18TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO-2010), 2010, : 2121 - 2125
  • [23] Principal subspace modification for multi-channel Wiener filter in multi-microphone noise reduction
    Kim, Gibak
    Cho, Nam Ik
    [J]. 2008 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-12, 2008, : 4909 - +
  • [24] MULTI-MODAL SPEAKER DIARIZATION OF REAL-WORLD MEETINGS USING COMPRESSED-DOMAIN VIDEO FEATURES
    Friedland, Gerald
    Hung, Hayley
    Yeo, Chuohao
    [J]. 2009 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1- 8, PROCEEDINGS, 2009, : 4069 - +
  • [25] Speech recognition based on space diversity using distributed multi-microphone
    Shimizu, Y
    Kajita, S
    Takeda, K
    Itakura, F
    [J]. 2000 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, PROCEEDINGS, VOLS I-VI, 2000, : 1747 - 1750
  • [26] Multirotor UAV State Prediction through Multi-Microphone Side-Channel Fusion
    Koops, Hendrik Vincent
    Garg, Kashish
    Kim, Munsung
    Li, Jonathan
    Volk, Anja
    Franchetti, Franz
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON MULTISENSOR FUSION AND INTEGRATION FOR INTELLIGENT SYSTEMS (MFI), 2017, : 15 - 21
  • [27] Detection of nearby UAVs using a multi-microphone array on board a UAV
    Cabrera-Ponce, Aldrich A.
    Martinez-Carranza, J.
    Rascon, Caleb
    [J]. INTERNATIONAL JOURNAL OF MICRO AIR VEHICLES, 2020, 12
  • [28] Multimodal Multi-Channel On-Line Speaker Diarization Using Sensor Fusion Through SVM
    Minotto, Vicente Peruffo
    Jung, Claudio Rosito
    Lee, Bowon
    [J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2015, 17 (10) : 1694 - 1705
  • [29] Range based multi microphone array fusion for speaker activity detection in small meetings
    Even, Jani
    Heracleous, Panikos
    Ishi, Carlos
    Hagita, Norihiro
    [J]. 12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 2748 - +
  • [30] Leveraging speaker attribute information using multi task learning for speaker verification and diarization
    Luu, Chau
    Bell, Peter
    Renals, Steve
    [J]. INTERSPEECH 2021, 2021, : 491 - 495