JOINT SPEAKER DIARIZATION AND RECOGNITION USING CONVOLUTIONAL AND RECURRENT NEURAL NETWORKS

被引:0
|
作者
Zhou, Zhihan [1 ]
Zhang, Yichi [1 ]
Duan, Zhiyao [1 ]
机构
[1] Univ Rochester, Dept Elect & Comp Engn, 601 Elmwood Ave, Rochester, NY 14627 USA
基金
美国国家科学基金会;
关键词
Speaker diarization; speaker recognition; convolutional neural network; recurrent neural network; speak change detection;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Speaker diarization (detecting who-spoke-when using relative identity labels) and speaker recognition (detecting absolute identity labels without timing) are different but related tasks that often need to be completed simultaneously in many scenarios. Traditional methods, however, address them independently. In this paper, we propose a method to jointly diarize and recognize speakers from a collection of conversations. This method benefits from the sparsity and temporal smoothness of speakers within a conversation and the large-scale timbre modeling across recordings and speakers. Specifically, we employ one convolutional neural network (CNN) to perform segment-level speaker classification and another CNN to detect the probability of speaker change within a conversation. We then concatenate the output of both CNNs and feed it into a recurrent neural network (RNN) for joint speaker diarization and recognition. Experiments on different datasets show promising performance of our proposed approach.
引用
收藏
页码:2496 / 2500
页数:5
相关论文
共 50 条
  • [1] Speaker Diarization Using Deep Recurrent Convolutional Neural Networks for Speaker Embeddings
    Cyrta, Pawel
    Trzcinski, Tomasz
    Stokowiec, Wojciech
    [J]. INFORMATION SYSTEMS ARCHITECTURE AND TECHNOLOGY, PT I, 2018, 655 : 107 - 117
  • [2] Speaker recognition using convolutional siamese neural networks
    Jung, Heeseung
    Yoon, Sanghyeuk
    Park, Neungsoo
    [J]. Transactions of the Korean Institute of Electrical Engineers, 2020, 69 (01): : 164 - 169
  • [3] Speaker diarization using autoassociative neural networks
    Jothilakshmi, S.
    Ramalingam, V.
    Palanivel, S.
    [J]. ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2009, 22 (4-5) : 667 - 675
  • [4] Joint speaker diarization and speech recognition based on region proposal networks
    Huang, Zili
    Delcroix, Marc
    Garcia, Leibny Paola
    Watanabe, Shinji
    Raj, Desh
    Khudanpur, Sanjeev
    [J]. COMPUTER SPEECH AND LANGUAGE, 2022, 72
  • [5] Speaker Diarization Using Convolutional Neural Network for Statistics Accumulation Refinement
    Zajic, Zbynek
    Hruz, Marek
    Mueller, Ladek
    [J]. 18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 3562 - 3566
  • [6] Speaker Recognition Using Constrained Convolutional Neural Networks in Emotional Speech
    Simic, Nikola
    Suzic, Sinisa
    Nosek, Tijana
    Vujovic, Mia
    Peric, Zoran
    Savic, Milan
    Delic, Vlado
    [J]. ENTROPY, 2022, 24 (03)
  • [7] Speech Emotion Recognition using Convolutional and Recurrent Neural Networks
    Lim, Wootaek
    Jang, Daeyoung
    Lee, Taejin
    [J]. 2016 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA), 2016,
  • [8] Speech Enhancement for Speaker Recognition Using Deep Recurrent Neural Networks
    Tkachenko, Maxim
    Yamshinin, Alexander
    Lyubimov, Nikolay
    Kotov, Mikhail
    Nastasenko, Marina
    [J]. SPEECH AND COMPUTER, SPECOM 2017, 2017, 10458 : 690 - 699
  • [9] Action Recognition using Convolutional Neural Networks with Joint Supervision
    Li, Yupeng
    Wang, Yuxiao
    Jiang, Yongfeng
    Zhang, Liang
    [J]. 2019 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2019, : 2015 - 2020
  • [10] Research on Additive Margin Softmax Speaker Recognition Based on Convolutional and Gated Recurrent Neural Networks
    Lan, Chaofeng
    Wang, Yuqiao
    Zhang, Lei
    Zhao, Hongyun
    [J]. AES: Journal of the Audio Engineering Society, 2022, 70 (7-8): : 611 - 620