Speaker Diarization: A Review of Recent Research

被引：392

作者：

Anguera Miro, Xavier ^{[1
]}

Bozonnet, Simon ^{[2
]}

Evans, Nicholas ^{[2
]}

Fredouille, Corinne ^{[3
]}

Friedland, Gerald ^{[4
]}

Vinyals, Oriol ^{[4
]}

机构：

[1] Telefon Res, Multimedia Res Grp, Barcelona 08021, Spain

[2] EURECOM, Multimedia Commun Dept, F-06904 Sophia Antipolis, France

[3] Univ Avignon, CERI LIA, F-84911 Avignon 9, France

[4] Int Comp Sci Inst, Berkeley, CA 94704 USA

来源：

IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING | 2012年 / 20卷 / 02期

关键词：

Meetings; rich transcription; speaker diarization; MEETINGS; FEATURES;

D O I：

10.1109/TASL.2011.2125954

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Speaker diarization is the task of determining "who spoke when?" in an audio or video recording that contains an unknown amount of speech and also an unknown number of speakers. Initially, it was proposed as a research topic related to automatic speech recognition, where speaker diarization serves as an upstream processing step. Over recent years, however, speaker diarization has become an important key technology for many tasks, such as navigation, retrieval, or higher level inference on audio data. Accordingly, many important improvements in accuracy and robustness have been reported in journals and conferences in the area. The application domains, from broadcast news, to lectures and meetings, vary greatly and pose different problems, such as having access to multiple microphones and multimodal information or overlapping speech. The most recent review of existing technology dates back to 2006 and focuses on the broadcast news domain. In this paper, we review the current state-of-the-art, focusing on research developed since 2006 that relates predominantly to speaker diarization for conference meetings. Finally, we present an analysis of speaker diarization performance as reported through the NIST Rich Transcription evaluations on meeting data and identify important areas for future research.

引用

页码：356 / 370

页数：15

共 50 条

[1] Speaker diarization: A review of recent research (vol 20, pg 356, 2012)
Anguera, Xavier
[J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2013, 21 (06): : 1308 - 1308
[2] A review of speaker diarization: Recent advances with deep learning
Park, Tae Jin
Kanda, Naoyuki
Dimitriadis, Dimitrios
Han, Kyu J.
Watanabe, Shinji
Narayanan, Shrikanth
[J]. COMPUTER SPEECH AND LANGUAGE, 2022, 72
[3] A review on speaker diarization systems and approaches
Moattar, M. H.
Homayounpour, M. M.
[J]. SPEECH COMMUNICATION, 2012, 54 (10) : 1065 - 1103
[4] A Free Synthetic Corpus for Speaker Diarization Research
Edwards, Erik
Brenndoerfer, Michael
Robinson, Amanda
Sadoughi, Najmeh
Finley, Greg P.
Korenevsky, Maxim
Axtmann, Nico
Miller, Mark
Suendermann-Oeft, David
[J]. SPEECH AND COMPUTER (SPECOM 2018), 2018, 11096 : 113 - 122
[5] SPEAKER DIARIZATION WITH LSTM
Wang, Quan
Downey, Carlton
Wan, Li
Mansfield, Philip Andrew
Moreno, Ignacio Lopez
[J]. 2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5239 - 5243
[6] Multimodal Speaker Diarization
Noulas, Athanasios
Englebienne, Gwenn
Krose, Ben J. A.
[J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2012, 34 (01) : 79 - 93
[7] SPEAKER DIARIZATION THROUGH SPEAKER EMBEDDINGS
Rouvier, Mickael
Bousquet, Pierre-Michel
Favre, Benoit
[J]. 2015 23RD EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2015, : 2082 - 2086
[8] Trainable Speaker Diarization
Aronowitz, Hagai
[J]. INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4, 2007, : 2021 - 2024
[9] TSUP Speaker Diarization System for Conversational Short-phrase Speaker Diarization Challenge
Pang, Bowen
Zhao, Huan
Zhang, Gaosheng
Yang, Xiaoyue
Sun, Yang
Zhang, Li
Wang, Qing
Xie, Lei
[J]. 2022 13TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2022, : 502 - 506
[10] New Advances in Speaker Diarization
Aronowitz, Hagai
Zhu, Weizhong
Suzuki, Masayuki
Kurata, Gakuto
Hoory, Ron
[J]. INTERSPEECH 2020, 2020, : 279 - 283

← 1 2 3 4 5 →