Unsupervised Speaker Diarization in Distributed IoT Networks Using Federated Learning

被引:0
|
作者
Bhuyan, Amit Kumar [1 ]
Dutta, Hrishikesh [1 ]
Biswas, Subir [1 ]
机构
[1] Michigan State Univ, Dept Elect & Comp Engn, E Lansing, MI 48823 USA
关键词
Mel frequency cepstral coefficient; Computational modeling; Accuracy; Oral communication; Training; Bayes methods; Feature extraction; Data models; Computational intelligence; Unsupervised Learning; Bayesian methods; federated learning; distributed processing; Hotelling's t-squared statistic; Bayesian information criterion; cepstral analysis; SEGMENTATION;
D O I
10.1109/TETCI.2024.3482855
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper presents a computationally efficient and distributed speaker diarization framework for networked IoT-style audio devices. The work proposes a Federated Learning model which can identify the participants in a conversation without the requirement of a large audio database for training. An unsupervised online update mechanism is proposed for the Federated Learning model which depends on cosine similarity of speaker embeddings. Moreover, the proposed diarization system solves the problem of speaker change detection via. unsupervised segmentation techniques using Hotelling's t-squared Statistic and Bayesian Information Criterion. In this new approach, speaker change detection is biased around detected quasi-silences, which reduces the severity of the trade-off between the missed detection and false detection rates. Additionally, the computational overhead due to frame-by-frame identification of speakers is reduced via. unsupervised clustering of speech segments. The results demonstrate the effectiveness of the proposed training method in the presence of non-IID speech data. It also shows a considerable improvement in the reduction of false and missed detection at the segmentation stage, while reducing the computational overhead. Improved accuracy and reduced computational cost makes the mechanism suitable for real-time speaker diarization across a distributed IoT audio network.
引用
收藏
页数:13
相关论文
共 50 条
  • [1] Unsupervised Speaker Diarization Using Riemannian Manifold Clustering
    Huang, Che-Wei
    Xiao, Bo
    Georgiou, Panayiotis G.
    Narayanan, Shrikanth S.
    15TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2014), VOLS 1-4, 2014, : 567 - 571
  • [2] Incremental Unsupervised Adversarial Domain Adaptation for Federated Learning in IoT Networks
    Huang, Yan
    Du, Mengxuan
    Zheng, Haifeng
    Feng, Xinxin
    2022 18TH INTERNATIONAL CONFERENCE ON MOBILITY, SENSING AND NETWORKING, MSN, 2022, : 186 - 190
  • [3] Unsupervised Data Splitting Scheme for Federated Edge Learning in IoT Networks
    Nour, Boubakr
    Cherkaoui, Soumaya
    IEEE INTERNATIONAL CONFERENCE ON COMMUNICATIONS (ICC 2022), 2022,
  • [4] SPEAKER DIARIZATION WITH UNSUPERVISED TRAINING FRAMEWORKL
    Le Lan, Gael
    Meignier, Sylvain
    Charlet, Delphine
    Deleglise, Paul
    2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 5560 - 5564
  • [5] Speaker diarization using autoassociative neural networks
    Jothilakshmi, S.
    Ramalingam, V.
    Palanivel, S.
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2009, 22 (4-5) : 667 - 675
  • [6] Multisensory Fusion for Unsupervised Spatiotemporal Speaker Diarization
    Xylogiannis, Paris
    Vryzas, Nikolaos
    Vrysis, Lazaros
    Dimoulas, Charalampos
    SENSORS, 2024, 24 (13)
  • [7] Unsupervised deep feature embeddings for speaker diarization
    Ahmad, Rehan
    Zubair, Syed
    TURKISH JOURNAL OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCES, 2019, 27 (04) : 3138 - 3149
  • [8] Speaker Diarization Using Deep Recurrent Convolutional Neural Networks for Speaker Embeddings
    Cyrta, Pawel
    Trzcinski, Tomasz
    Stokowiec, Wojciech
    INFORMATION SYSTEMS ARCHITECTURE AND TECHNOLOGY, PT I, 2018, 655 : 107 - 117
  • [9] Unsupervised Methods for Speaker Diarization: An Integrated and Iterative Approach
    Shum, Stephen H.
    Dehak, Najim
    Dehak, Reda
    Glass, James R.
    IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2013, 21 (10): : 2015 - 2028
  • [10] FedCL: An Efficient Federated Unsupervised Learning for Model Sharing in IoT
    Zhao, Chen
    Gao, Zhipeng
    Wang, Qian
    Mo, Zijia
    Yu, Xinlei
    COLLABORATIVE COMPUTING: NETWORKING, APPLICATIONS AND WORKSHARING, COLLABORATECOM 2022, PT I, 2022, 460 : 115 - 134