MULTI-STREAM CONVOLUTIONAL NEURAL NETWORK WITH FREQUENCY SELECTION FOR ROBUST SPEAKER VERIFICATION

被引:0
|
作者
Yao, Wei [1 ]
Chen, Shen [2 ]
Cui, Jiamin [1 ]
Lou, Yaolin [1 ]
机构
[1] Zhejiang Univ Water Resources & Elect Power, Coll Elect Engn, Key Lab Technol Rural Water Management Zhejiang Pr, Hangzhou, Peoples R China
[2] Wanbang Digital Energy Co Ltd China, Hangzhou, Peoples R China
关键词
Deep learning; speaker verification; convolutional neural network; mul-; ti-stream; frequency selection;
D O I
10.31577/cai20244819
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Speaker verification aims to verify whether an input speech corresponds to the claimed speaker, and conventionally, this kind of system is deployed based on single-stream scenario, wherein the feature extractor operates in full frequency range. In this paper, we hypothesize that machine can learn enough knowledge to do classification task when listening to partial frequency range instead of full frequency range, which is so called frequency selection technique, and further propose a novel framework of multi-stream Convolutional Neural Network (CNN) with this technique for speaker verification tasks. The proposed framework accommodates diverse temporal embeddings generated from multiple streams to enhance the robustness of acoustic modeling. For the diversity of temporal embeddings, we consider feature augmentation with frequency selection, which is to manually segment the full-band of frequency into several sub-bands, and the feature extractor of each stream can select which sub-bands to use as target frequency domain. Different from conventional single-stream solution wherein each utterance would only be processed for one time, in this framework, there are multiple streams processing it in parallel. The input utterance for each stream is pre-processed by a frequency selector within specified frequency range, and post-processed by mean normalization. The normalized temporal embeddings of each stream will flow into a pooling layer to generate fused embeddings. We conduct extensive experiments on VoxCeleb dataset, and the experimental results demonstrate that multi-stream CNN significantly outperforms single-stream baseline with 20.53% of relative improvement in minimum Decision Cost Function (minDCF) and 15.28% of relative improvement in Equal Error Rate (EER).
引用
收藏
页码:819 / 848
页数:30
相关论文
共 50 条
  • [1] Robust Speaker Recognition Based on Multi-Stream Features
    Wang, Ning
    Wang, Lei
    [J]. 2016 IEEE INTERNATIONAL CONFERENCE ON CONSUMER ELECTRONICS-CHINA (ICCE-CHINA), 2016,
  • [2] DeepCarotene - Job Title Classification with Multi-stream Convolutional Neural Network
    Wang, Jingya
    Abdelfatah, Kareem
    Korayem, Mohammed
    Balaji, Janani
    [J]. 2019 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2019, : 1953 - 1961
  • [3] Multi-Stream Convolutional Neural Network for SAR Automatic Target Recognition
    Zhao, Pengfei
    Liu, Kai
    Zou, Hao
    Zhen, Xiantong
    [J]. REMOTE SENSING, 2018, 10 (09)
  • [4] An analysis of information segregation in parallel streams of a multi-stream convolutional neural network
    Tamura, Hiroshi
    [J]. SCIENTIFIC REPORTS, 2024, 14 (01):
  • [5] Evaluation of a noise-robust multi-stream speaker verification method using F0 information
    Asami, Taichi
    Iwano, Koji
    Furui, Sadaoki
    [J]. IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2008, E91D (03): : 549 - 557
  • [6] Multi-stream convolutional neural network-based fault diagnosis for variable frequency drives in sustainable manufacturing systems
    Grezmak, John
    Zhang, Jianjing
    Wang, Peng
    Gao, Robert X.
    [J]. SUSTAINABLE MANUFACTURING - HAND IN HAND TO SUSTAINABILITY ON GLOBE, 2020, 43 : 511 - 518
  • [7] A stream-weight and threshold estimation method using adaboost for multi-stream speaker verification
    Asami, Taichi
    Iwano, Koji
    Furui, Sadaoki
    [J]. 2006 IEEE International Conference on Acoustics, Speech and Signal Processing, Vols 1-13, 2006, : 5939 - 5942
  • [8] Multi-stream Architecture and Multi-scale Convolutional Neural Network for Remote Sensing Image Fusion
    Lei Dajiang
    Du Jiahao
    Zhang Liping
    Li Weisheng
    [J]. JOURNAL OF ELECTRONICS & INFORMATION TECHNOLOGY, 2022, 44 (01) : 237 - 244
  • [9] Multi-Stream Deep Convolutional Neural Network for PET Preform Surface Defects Detection and Classification
    Zhang, Taochuan
    Duan, Chunmei
    [J]. IEEE ACCESS, 2021, 9 : 156973 - 156986
  • [10] Empowering Speaker Verification with Deep Convolutional Neural Network Vectors
    Hourri, Soufiane
    [J]. STUDIES IN INFORMATICS AND CONTROL, 2024, 33 (02): : 97 - 107