End-To-End Neural Speaker Diarization Through Step-Function

被引:0
|
作者
Latypov, Rustam [1 ]
Stolov, Evgeni [1 ]
机构
[1] Kazan Fed Univ, Dept Comp Sci, Kazan, Russia
基金
俄罗斯科学基金会;
关键词
speaker diarization; step-function; neural net;
D O I
10.1109/AICT52784.2021.9620513
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This paper presents an approach to the speaker diarization problem based on a step-wise form of speech file. There are known solutions to the diarization problem using complicated models of neural nets. Hence training of such net requires serious computation. The research goal is to construct an algorithm that uses very restricted resources during the discussion with a few persons using just a regular notebook. The time needed for training the system for work with the given persons is also minimal. This goal is attained by transforming the input signal of the net into a step-function having three values. This circumstance provides leveraging a simple model of the neural net for end-to-end diarization. For training, we use a segmented speech file where any segment belongs to one speaker. The number of speakers is known in advance. We convert each segment into a step-function applying the threshold value estimated using the developed fast algorithm. Using the end-to-end neural net, we exclude the clusterization step in the speaker diarization problem. Experiments show the acceptability of diarization quality.
引用
收藏
页数:5
相关论文
共 50 条
  • [1] TOWARDS END-TO-END SPEAKER DIARIZATION WITH GENERALIZED NEURAL SPEAKER CLUSTERING
    Zhang, Chunlei
    Shi, Jiatong
    Weng, Chao
    Yu, Meng
    Yu, Dong
    [J]. 2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 8372 - 8376
  • [2] END-TO-END NEURAL SPEAKER DIARIZATION WITH SELF-ATTENTION
    Fujita, Yusuke
    Kanda, Naoyuki
    Horiguchi, Shota
    Xue, Yawen
    Nagamatsu, Kenji
    Watanabe, Shinji
    [J]. 2019 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU 2019), 2019, : 296 - 303
  • [3] End-to-End Audio-Visual Neural Speaker Diarization
    He, Mao-kui
    Du, Jun
    Lee, Chin-Hui
    [J]. INTERSPEECH 2022, 2022, : 1461 - 1465
  • [4] Robust End-to-end Speaker Diarization with Generic Neural Clustering
    Yang, Chenyu
    Wang, Yu
    [J]. INTERSPEECH 2022, 2022, : 1471 - 1475
  • [5] End-to-End Neural Speaker Diarization With Non-Autoregressive Attractors
    Rybicka, Magdalena
    Villalba, Jesus
    Thebaud, Thomas
    Dehak, Najim
    Kowalczyk, Konrad
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2024, 32 : 3960 - 3973
  • [6] End-to-End Neural Speaker Diarization with Permutation-Free Objectives
    Fujita, Yusuke
    Kanda, Naoyuki
    Horiguchi, Shota
    Nagamatsu, Kenji
    Watanabe, Shinji
    [J]. INTERSPEECH 2019, 2019, : 4300 - 4304
  • [7] ONLINE END-TO-END NEURAL DIARIZATION WITH SPEAKER-TRACING BUFFER
    Xue, Yawen
    Horiguchi, Shota
    Fujita, Yusuke
    Watanabe, Shinji
    Garcia, Paola
    Nagamatsu, Kenji
    [J]. 2021 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP (SLT), 2021, : 841 - 848
  • [8] End-to-end neural speaker diarization with an iterative adaptive attractor estimation
    Hao, Fengyuan
    Li, Xiaodong
    Zheng, Chengshi
    [J]. NEURAL NETWORKS, 2023, 166 : 566 - 578
  • [9] END-TO-END SPEAKER DIARIZATION AS POST-PROCESSING
    Horiguchi, Shota
    Garcia, Paola
    Fujita, Yusuke
    Watanabe, Shinji
    Nagamatsu, Kenji
    [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 7188 - 7192
  • [10] Self-Conditioning via Intermediate Predictions for End-to-End Neural Speaker Diarization
    Fujita, Yusuke
    Ogawa, Tetsuji
    Kobayashi, Tetsunori
    [J]. IEEE ACCESS, 2023, 11 (140069-140076) : 140069 - 140076