SPEAKER DIARISATION USING 2D SELF-ATTENTIVE COMBINATION OF EMBEDDINGS

被引:0
|
作者
Sun, G. [1 ]
Zhang, C. [1 ]
Woodland, P. C. [1 ]
机构
[1] Univ Cambridge, Dept Engn, Trumpington St, Cambridge CB2 1PZ, England
关键词
Speaker diarization; d-vector; self-attention; model combination; DIARIZATION;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Speaker diarisation systems often cluster audio segments using speaker embeddings such as i-vectors and d-vectors. Since different types of embeddings are often complementary, this paper proposes a generic framework to improve performance by combining them into a single embedding, referred to as a c-vector. This combination uses a 2-dimensional ( 2D) self-attentive structure, which extends the standard self-attentive layer by averaging not only across time but also across different types of embeddings. Two types of 2D self-attentive structure studied in this paper are simultaneous combination and consecutive combination, which adopt single and multiple self-attentive layers respectively. The penalty term in the original self-attentive layer, which is jointly minimised with the objective function to encourage diversity of annotation vectors, is also modified to obtain not only different local peaks but also the overall trends in the multiple annotation vectors. Experiments on the AMI meeting corpus show that our modified penalty term improves the d-vector relative speaker error rate ( SER) by 6% and 21% for d-vector systems, and a 10% further relative SER reduction can be obtained using the c-vector from our best 2D self-attentive structure.
引用
收藏
页码:5801 / 5805
页数:5
相关论文
共 50 条
  • [1] Self-Attentive Speaker Embeddings for Text-Independent Speaker Verification
    Zhu, Yingke
    Ko, Tom
    Snyder, David
    Mak, Brian
    Povey, Daniel
    [J]. 19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 3573 - 3577
  • [2] Combination of deep speaker embeddings for diarisation
    Sun, Guangzhi
    Zhang, Chao
    Woodland, Philip C.
    [J]. NEURAL NETWORKS, 2021, 141 : 372 - 384
  • [3] Bayesian Self-Attentive Speaker Embeddings for Text-Independent Speaker Verification
    Zhu, Yingke
    Mak, Brian
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2023, 31 : 1000 - 1012
  • [4] SAFE: Self-Attentive Function Embeddings for Binary Similarity
    Massarelli, Luca
    Di Luna, Giuseppe Antonio
    Petroni, Fabio
    Baldoni, Roberto
    Querzoni, Leonardo
    [J]. DETECTION OF INTRUSIONS AND MALWARE, AND VULNERABILITY ASSESSMENT (DIMVA 2019), 2019, 11543 : 309 - 329
  • [5] Self-Attentive Similarity Measurement Strategies in Speaker Diarization
    Lin, Qingjian
    Hou, Yu
    Li, Ming
    [J]. INTERSPEECH 2020, 2020, : 284 - 288
  • [6] Masked cross self-attentive encoding based speaker embedding for speaker verification
    Seo, Soonshin
    Kim, Ji-Hwan
    [J]. JOURNAL OF THE ACOUSTICAL SOCIETY OF KOREA, 2020, 39 (05): : 497 - 504
  • [7] Causality extraction based on self-attentive BiLSTM-CRF with transferred embeddings
    Li, Zhaoning
    Li, Qi
    Zou, Xiaotian
    Ren, Jiangtao
    [J]. NEUROCOMPUTING, 2021, 423 : 207 - 219
  • [8] MIMO Self-attentive RNN Beamformer for Multi-speaker Speech Separation
    Li, Xiyun
    Xu, Yong
    Yu, Meng
    Zhang, Shi-Xiong
    Xu, Jiaming
    Xu, Bo
    Yu, Dong
    [J]. INTERSPEECH 2021, 2021, : 1119 - 1123
  • [9] SAGRNN: Self-Attentive Gated RNN For Binaural Speaker Separation With Interaural Cue Preservation
    Tan, Ke
    Xu, Buye
    Kumar, Anurag
    Nachmani, Eliya
    Adi, Yossi
    [J]. IEEE SIGNAL PROCESSING LETTERS, 2021, 28 (26-30) : 26 - 30
  • [10] Image Inpainting Using Contextual Feature Adjustment and Joint Self-Attentive
    Peng, Hao
    Li, Xiaoming
    [J]. Computer Engineering and Applications, 2023, 59 (19) : 184 - 191