Speaker Diarization: A Review of Recent Research

被引:392
|
作者
Anguera Miro, Xavier [1 ]
Bozonnet, Simon [2 ]
Evans, Nicholas [2 ]
Fredouille, Corinne [3 ]
Friedland, Gerald [4 ]
Vinyals, Oriol [4 ]
机构
[1] Telefon Res, Multimedia Res Grp, Barcelona 08021, Spain
[2] EURECOM, Multimedia Commun Dept, F-06904 Sophia Antipolis, France
[3] Univ Avignon, CERI LIA, F-84911 Avignon 9, France
[4] Int Comp Sci Inst, Berkeley, CA 94704 USA
关键词
Meetings; rich transcription; speaker diarization; MEETINGS; FEATURES;
D O I
10.1109/TASL.2011.2125954
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Speaker diarization is the task of determining "who spoke when?" in an audio or video recording that contains an unknown amount of speech and also an unknown number of speakers. Initially, it was proposed as a research topic related to automatic speech recognition, where speaker diarization serves as an upstream processing step. Over recent years, however, speaker diarization has become an important key technology for many tasks, such as navigation, retrieval, or higher level inference on audio data. Accordingly, many important improvements in accuracy and robustness have been reported in journals and conferences in the area. The application domains, from broadcast news, to lectures and meetings, vary greatly and pose different problems, such as having access to multiple microphones and multimodal information or overlapping speech. The most recent review of existing technology dates back to 2006 and focuses on the broadcast news domain. In this paper, we review the current state-of-the-art, focusing on research developed since 2006 that relates predominantly to speaker diarization for conference meetings. Finally, we present an analysis of speaker diarization performance as reported through the NIST Rich Transcription evaluations on meeting data and identify important areas for future research.
引用
收藏
页码:356 / 370
页数:15
相关论文
共 50 条
  • [1] Speaker diarization: A review of recent research (vol 20, pg 356, 2012)
    Anguera, Xavier
    [J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2013, 21 (06): : 1308 - 1308
  • [2] A review of speaker diarization: Recent advances with deep learning
    Park, Tae Jin
    Kanda, Naoyuki
    Dimitriadis, Dimitrios
    Han, Kyu J.
    Watanabe, Shinji
    Narayanan, Shrikanth
    [J]. COMPUTER SPEECH AND LANGUAGE, 2022, 72
  • [3] A review on speaker diarization systems and approaches
    Moattar, M. H.
    Homayounpour, M. M.
    [J]. SPEECH COMMUNICATION, 2012, 54 (10) : 1065 - 1103
  • [4] A Free Synthetic Corpus for Speaker Diarization Research
    Edwards, Erik
    Brenndoerfer, Michael
    Robinson, Amanda
    Sadoughi, Najmeh
    Finley, Greg P.
    Korenevsky, Maxim
    Axtmann, Nico
    Miller, Mark
    Suendermann-Oeft, David
    [J]. SPEECH AND COMPUTER (SPECOM 2018), 2018, 11096 : 113 - 122
  • [5] SPEAKER DIARIZATION WITH LSTM
    Wang, Quan
    Downey, Carlton
    Wan, Li
    Mansfield, Philip Andrew
    Moreno, Ignacio Lopez
    [J]. 2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5239 - 5243
  • [6] Multimodal Speaker Diarization
    Noulas, Athanasios
    Englebienne, Gwenn
    Krose, Ben J. A.
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2012, 34 (01) : 79 - 93
  • [7] SPEAKER DIARIZATION THROUGH SPEAKER EMBEDDINGS
    Rouvier, Mickael
    Bousquet, Pierre-Michel
    Favre, Benoit
    [J]. 2015 23RD EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2015, : 2082 - 2086
  • [8] Trainable Speaker Diarization
    Aronowitz, Hagai
    [J]. INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4, 2007, : 2021 - 2024
  • [9] TSUP Speaker Diarization System for Conversational Short-phrase Speaker Diarization Challenge
    Pang, Bowen
    Zhao, Huan
    Zhang, Gaosheng
    Yang, Xiaoyue
    Sun, Yang
    Zhang, Li
    Wang, Qing
    Xie, Lei
    [J]. 2022 13TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2022, : 502 - 506
  • [10] New Advances in Speaker Diarization
    Aronowitz, Hagai
    Zhu, Weizhong
    Suzuki, Masayuki
    Kurata, Gakuto
    Hoory, Ron
    [J]. INTERSPEECH 2020, 2020, : 279 - 283