A REAL-TIME SPEAKER DIARIZATION SYSTEM BASED ON SPATIAL SPECTRUM

被引：13

作者：

Zheng, Siqi ^{[1
]}

Huang, Weilong ^{[1
]}

Wang, Xianliang ^{[1
]}

Suo, Hongbin ^{[1
]}

Feng, Jinwei ^{[1
]}

Yan, Zhijie ^{[1
]}

机构：

[1] Alibaba Grp, Speech Lab, Hangzhou, Zhejiang, Peoples R China

来源：

2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021) | 2021年

关键词：

Speaker diarization; speaker localization; microphone array; SPEECH;

D O I：

10.1109/ICASSP39728.2021.9413544

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

In this paper we describe a speaker diarization system that enables localization and identification of all speakers present in a conversation or meeting. We propose a novel systematic approach to tackle several long-standing challenges in speaker diarization tasks: (1) to segment and separate overlapping speech from two speakers; (2) to estimate the number of speakers when participants may enter or leave the conversation at any time; (3) to provide accurate speaker identification on short text-independent utterances; (4) to track down speakers movement during the conversation; (5) to detect speaker change incidence real-time. First, a differential directional microphone array-based approach is exploited to capture the target speakers' voice in far-field adverse environment. Second, an online speaker-location joint clustering approach is proposed to keep track of speaker location. Third, an instant speaker number detector is developed to trigger the mechanism that separates overlapped speech. The results suggest that our system effectively incorporates spatial information and achieves significant gains.

引用

页码：7208 / 7212

页数：5

共 50 条

[1] Real-time multilingual speech recognition and speaker diarization system based on Whisper segmentation
Lyu, Ke-Ming
Lyu, Ren-yuan
Chang, Hsien-Tsung
[J]. PEERJ COMPUTER SCIENCE, 2024, 10
[2] Chronological Self-Training for Real-Time Speaker Diarization
Padfield, Dirk
Liebling, Daniel J.
[J]. INTERSPEECH 2021, 2021, : 4613 - 4617
[3] A DOA based speaker diarization system for real meetings
Araki, Shoko
Fujimoto, Masakiyo
Ishizuka, Kentaro
Sawada, Hiroshi
Makino, Shoji
[J]. 2008 HANDS-FREE SPEECH COMMUNICATION AND MICROPHONE ARRAYS, 2008, : 30 - 33
[4] Real-time speaker identification system
Al-Shboul, Bashar
Alsawalqah, Hamad
Lee, Dongman
[J]. PROCEEDINGS OF THE 7TH WSEAS INTERNATIONAL CONFERENCE ON APPLIED COMPUTER SCIENCE: COMPUTER SCIENCE CHALLENGES, 2007, : 422 - +
[5] Real-Time Implementation of Speaker Diarization System on Raspberry PI3 Using TLBO Clustering Algorithm
Dabbabi, Karim
Hajji, Salah
Cherif, Adnen
[J]. CIRCUITS SYSTEMS AND SIGNAL PROCESSING, 2020, 39 (08) : 4094 - 4109
[6] Real-Time Implementation of Speaker Diarization System on Raspberry PI3 Using TLBO Clustering Algorithm
Karim Dabbabi
Salah Hajji
Adnen Cherif
[J]. Circuits, Systems, and Signal Processing, 2020, 39 : 4094 - 4109
[7] A fast-match approach for robust, faster than real-time speaker diarization
Huang, Yan
Vinyals, Oriol
Friedland, Gerald
Mueller, Christian
Mirghafori, Nikki
Wooters, Chuck
[J]. 2007 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING, VOLS 1 AND 2, 2007, : 693 - 698
[8] SINGLE-CHANNEL SPEAKER DIARIZATION BASED ON SPATIAL FEATURES
Hu, Mathieu
Parada, Pablo Peso
Sharma, Dushyant
Doclo, Simon
van Waterschoot, Toon
Brookes, Mike
Naylor, Patrick A.
[J]. 2015 IEEE WORKSHOP ON APPLICATIONS OF SIGNAL PROCESSING TO AUDIO AND ACOUSTICS (WASPAA), 2015,
[9] Clustering Initialization Based on Spatial Information for Speaker Diarization of Meetings
Luque, J.
Segura, C.
Hernando, J.
[J]. INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5, 2008, : 383 - 386
[10] An Improved Speaker Diarization System
Fu, Rong
Benest, Ian D.
[J]. INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4, 2007, : 1253 - 1256

← 1 2 3 4 5 →