A REAL-TIME SPEAKER DIARIZATION SYSTEM BASED ON SPATIAL SPECTRUM

被引:13
|
作者
Zheng, Siqi [1 ]
Huang, Weilong [1 ]
Wang, Xianliang [1 ]
Suo, Hongbin [1 ]
Feng, Jinwei [1 ]
Yan, Zhijie [1 ]
机构
[1] Alibaba Grp, Speech Lab, Hangzhou, Zhejiang, Peoples R China
关键词
Speaker diarization; speaker localization; microphone array; SPEECH;
D O I
10.1109/ICASSP39728.2021.9413544
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
In this paper we describe a speaker diarization system that enables localization and identification of all speakers present in a conversation or meeting. We propose a novel systematic approach to tackle several long-standing challenges in speaker diarization tasks: (1) to segment and separate overlapping speech from two speakers; (2) to estimate the number of speakers when participants may enter or leave the conversation at any time; (3) to provide accurate speaker identification on short text-independent utterances; (4) to track down speakers movement during the conversation; (5) to detect speaker change incidence real-time. First, a differential directional microphone array-based approach is exploited to capture the target speakers' voice in far-field adverse environment. Second, an online speaker-location joint clustering approach is proposed to keep track of speaker location. Third, an instant speaker number detector is developed to trigger the mechanism that separates overlapped speech. The results suggest that our system effectively incorporates spatial information and achieves significant gains.
引用
收藏
页码:7208 / 7212
页数:5
相关论文
共 50 条
  • [1] Real-time multilingual speech recognition and speaker diarization system based on Whisper segmentation
    Lyu, Ke-Ming
    Lyu, Ren-yuan
    Chang, Hsien-Tsung
    [J]. PEERJ COMPUTER SCIENCE, 2024, 10
  • [2] Chronological Self-Training for Real-Time Speaker Diarization
    Padfield, Dirk
    Liebling, Daniel J.
    [J]. INTERSPEECH 2021, 2021, : 4613 - 4617
  • [3] A DOA based speaker diarization system for real meetings
    Araki, Shoko
    Fujimoto, Masakiyo
    Ishizuka, Kentaro
    Sawada, Hiroshi
    Makino, Shoji
    [J]. 2008 HANDS-FREE SPEECH COMMUNICATION AND MICROPHONE ARRAYS, 2008, : 30 - 33
  • [4] Real-time speaker identification system
    Al-Shboul, Bashar
    Alsawalqah, Hamad
    Lee, Dongman
    [J]. PROCEEDINGS OF THE 7TH WSEAS INTERNATIONAL CONFERENCE ON APPLIED COMPUTER SCIENCE: COMPUTER SCIENCE CHALLENGES, 2007, : 422 - +
  • [5] Real-Time Implementation of Speaker Diarization System on Raspberry PI3 Using TLBO Clustering Algorithm
    Dabbabi, Karim
    Hajji, Salah
    Cherif, Adnen
    [J]. CIRCUITS SYSTEMS AND SIGNAL PROCESSING, 2020, 39 (08) : 4094 - 4109
  • [6] Real-Time Implementation of Speaker Diarization System on Raspberry PI3 Using TLBO Clustering Algorithm
    Karim Dabbabi
    Salah Hajji
    Adnen Cherif
    [J]. Circuits, Systems, and Signal Processing, 2020, 39 : 4094 - 4109
  • [7] A fast-match approach for robust, faster than real-time speaker diarization
    Huang, Yan
    Vinyals, Oriol
    Friedland, Gerald
    Mueller, Christian
    Mirghafori, Nikki
    Wooters, Chuck
    [J]. 2007 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING, VOLS 1 AND 2, 2007, : 693 - 698
  • [8] SINGLE-CHANNEL SPEAKER DIARIZATION BASED ON SPATIAL FEATURES
    Hu, Mathieu
    Parada, Pablo Peso
    Sharma, Dushyant
    Doclo, Simon
    van Waterschoot, Toon
    Brookes, Mike
    Naylor, Patrick A.
    [J]. 2015 IEEE WORKSHOP ON APPLICATIONS OF SIGNAL PROCESSING TO AUDIO AND ACOUSTICS (WASPAA), 2015,
  • [9] Clustering Initialization Based on Spatial Information for Speaker Diarization of Meetings
    Luque, J.
    Segura, C.
    Hernando, J.
    [J]. INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5, 2008, : 383 - 386
  • [10] An Improved Speaker Diarization System
    Fu, Rong
    Benest, Ian D.
    [J]. INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4, 2007, : 1253 - 1256