MULTI-STREAM CONVOLUTIONAL NEURAL NETWORK WITH FREQUENCY SELECTION FOR ROBUST SPEAKER VERIFICATION

被引:0
|
作者
Yao, Wei [1 ]
Chen, Shen [2 ]
Cui, Jiamin [1 ]
Lou, Yaolin [1 ]
机构
[1] Zhejiang Univ Water Resources & Elect Power, Coll Elect Engn, Key Lab Technol Rural Water Management Zhejiang Pr, Hangzhou, Peoples R China
[2] Wanbang Digital Energy Co Ltd China, Hangzhou, Peoples R China
关键词
Deep learning; speaker verification; convolutional neural network; mul-; ti-stream; frequency selection;
D O I
10.31577/cai20244819
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Speaker verification aims to verify whether an input speech corresponds to the claimed speaker, and conventionally, this kind of system is deployed based on single-stream scenario, wherein the feature extractor operates in full frequency range. In this paper, we hypothesize that machine can learn enough knowledge to do classification task when listening to partial frequency range instead of full frequency range, which is so called frequency selection technique, and further propose a novel framework of multi-stream Convolutional Neural Network (CNN) with this technique for speaker verification tasks. The proposed framework accommodates diverse temporal embeddings generated from multiple streams to enhance the robustness of acoustic modeling. For the diversity of temporal embeddings, we consider feature augmentation with frequency selection, which is to manually segment the full-band of frequency into several sub-bands, and the feature extractor of each stream can select which sub-bands to use as target frequency domain. Different from conventional single-stream solution wherein each utterance would only be processed for one time, in this framework, there are multiple streams processing it in parallel. The input utterance for each stream is pre-processed by a frequency selector within specified frequency range, and post-processed by mean normalization. The normalized temporal embeddings of each stream will flow into a pooling layer to generate fused embeddings. We conduct extensive experiments on VoxCeleb dataset, and the experimental results demonstrate that multi-stream CNN significantly outperforms single-stream baseline with 20.53% of relative improvement in minimum Decision Cost Function (minDCF) and 15.28% of relative improvement in Equal Error Rate (EER).
引用
收藏
页码:819 / 848
页数:30
相关论文
共 50 条
  • [21] Multi-Stream Speaker Diarization Systems for the Meetings Domain
    Gallardo-Antolin, Ascension
    Anguera, Xavier
    Wooters, Chuck
    [J]. INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 2186 - +
  • [22] A multi-stream convolutional neural network for sEMG-based gesture recognition in muscle-computer interface
    Wei, Wentao
    Wong, Yongkang
    Du, Yu
    Hu, Yu
    Kankanhalli, Mohan
    Geng, Weidong
    [J]. PATTERN RECOGNITION LETTERS, 2019, 119 : 131 - 138
  • [23] Offset or Onset Frame: A Multi-Stream Convolutional Neural Network with CapsuleNet Module for Micro-expression Recognition
    Liu, Nian
    Liu, Xinyu
    Zhang, Zhihao
    Xu, Xueming
    Chen, Tong
    [J]. 2020 5TH INTERNATIONAL CONFERENCE ON INTELLIGENT INFORMATICS AND BIOMEDICAL SCIENCES (ICIIBMS 2020), 2020, : 236 - 240
  • [24] Post-processing of short-term quantitative precipitation forecast with the multi-stream convolutional neural network
    Tian, Ye
    Ji, Yan
    Gao, Xichao
    Yuan, Xing
    Zhi, Xiefei
    [J]. ATMOSPHERIC RESEARCH, 2024, 309
  • [25] Efficient skeleton-based action recognition via multi-stream depthwise separable convolutional neural network
    Yin, Ming
    He, Shaocong
    Soomro, Tourfique Ahemd
    Yuan, Haoliang
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2023, 226
  • [26] Multi-Stream Opportunistic Network Decoupling: Relay Selection and Interference Management
    Lin, Huifa
    Shin, Won-Yong
    Jung, Bang Chul
    [J]. IEEE TRANSACTIONS ON MOBILE COMPUTING, 2019, 18 (10) : 2372 - 2385
  • [27] Viewpoint guided multi-stream neural network for skeleton action recognition
    He, Yicheng
    Liang, Zixi
    He, Shaocong
    Wang, Yonghua
    Yin, Ming
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2024, 83 (03) : 6783 - 6802
  • [28] Multimodal Gesture Recognition Using Multi-stream Recurrent Neural Network
    Nishida, Noriki
    Nakayama, Hideki
    [J]. IMAGE AND VIDEO TECHNOLOGY, PSIVT 2015, 2016, 9431 : 682 - 694
  • [29] Automatic Modulation Classification Using a Deep Multi-Stream Neural Network
    Zhang, Hao
    Wang, Yan
    Xu, Lingwei
    Gulliver, T. Aaron
    Cao, Conghui
    [J]. IEEE ACCESS, 2020, 8 : 43888 - 43897
  • [30] Background Knowledge Based Multi-Stream Neural Network for Text Classification
    Ren, Fuji
    Deng, Jiawen
    [J]. APPLIED SCIENCES-BASEL, 2018, 8 (12):