MULTI-STREAM CONVOLUTIONAL NEURAL NETWORK WITH FREQUENCY SELECTION FOR ROBUST SPEAKER VERIFICATION

被引:0
|
作者
Yao, Wei [1 ]
Chen, Shen [2 ]
Cui, Jiamin [1 ]
Lou, Yaolin [1 ]
机构
[1] Zhejiang Univ Water Resources & Elect Power, Coll Elect Engn, Key Lab Technol Rural Water Management Zhejiang Pr, Hangzhou, Peoples R China
[2] Wanbang Digital Energy Co Ltd China, Hangzhou, Peoples R China
关键词
Deep learning; speaker verification; convolutional neural network; mul-; ti-stream; frequency selection;
D O I
10.31577/cai20244819
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Speaker verification aims to verify whether an input speech corresponds to the claimed speaker, and conventionally, this kind of system is deployed based on single-stream scenario, wherein the feature extractor operates in full frequency range. In this paper, we hypothesize that machine can learn enough knowledge to do classification task when listening to partial frequency range instead of full frequency range, which is so called frequency selection technique, and further propose a novel framework of multi-stream Convolutional Neural Network (CNN) with this technique for speaker verification tasks. The proposed framework accommodates diverse temporal embeddings generated from multiple streams to enhance the robustness of acoustic modeling. For the diversity of temporal embeddings, we consider feature augmentation with frequency selection, which is to manually segment the full-band of frequency into several sub-bands, and the feature extractor of each stream can select which sub-bands to use as target frequency domain. Different from conventional single-stream solution wherein each utterance would only be processed for one time, in this framework, there are multiple streams processing it in parallel. The input utterance for each stream is pre-processed by a frequency selector within specified frequency range, and post-processed by mean normalization. The normalized temporal embeddings of each stream will flow into a pooling layer to generate fused embeddings. We conduct extensive experiments on VoxCeleb dataset, and the experimental results demonstrate that multi-stream CNN significantly outperforms single-stream baseline with 20.53% of relative improvement in minimum Decision Cost Function (minDCF) and 15.28% of relative improvement in Equal Error Rate (EER).
引用
收藏
页码:819 / 848
页数:30
相关论文
共 50 条
  • [31] Automatic Modulation Classification Using a Deep Multi-Stream Neural Network
    Zhang, Hao
    Wang, Yan
    Xu, Lingwei
    Gulliver, T. Aaron
    Cao, Conghui
    IEEE ACCESS, 2020, 8 : 43888 - 43897
  • [32] Viewpoint guided multi-stream neural network for skeleton action recognition
    Yicheng He
    Zixi Liang
    Shaocong He
    Yonghua Wang
    Ming Yin
    Multimedia Tools and Applications, 2024, 83 : 6783 - 6802
  • [33] A multi-stream network for retrosynthesis prediction
    Qiang Zhang
    Juan Liu
    Wen Zhang
    Feng Yang
    Zhihui Yang
    Xiaolei Zhang
    Frontiers of Computer Science, 2024, 18
  • [34] A multi-stream network for retrosynthesis prediction
    Zhang, Qiang
    Liu, Juan
    Zhang, Wen
    Yang, Feng
    Yang, Zhihui
    Zhang, Xiaolei
    FRONTIERS OF COMPUTER SCIENCE, 2024, 18 (02)
  • [35] Cervical Cell Features Based Multi-Stream Convolutional Neural Networks Classification Method
    Yang Z.
    Li Y.
    Yang B.
    Pang W.
    Tian Z.
    Wang Y.
    Jisuanji Fuzhu Sheji Yu Tuxingxue Xuebao/Journal of Computer-Aided Design and Computer Graphics, 2019, 31 (04): : 531 - 540
  • [36] A Multi-Stream Convolutional Neural Network for Classification of Progressive MCI in Alzheimer's Disease Using Structural MRI Images
    Ashtari-Majlan, Mona
    Seifi, Abbas
    Dehshibi, Mohammad Mahdi
    IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, 2022, 26 (08) : 3918 - 3926
  • [37] Where are the People? A Multi-Stream Convolutional Neural Network for Crowd Counting via Density Map from Complex Images
    Ttito, Darwin
    Quispe, Rodolfo
    Rivera, Adin Ramfrez
    Pedrini, Helio
    PROCEEDINGS OF 2019 INTERNATIONAL CONFERENCE ON SYSTEMS, SIGNALS AND IMAGE PROCESSING (IWSSIP 2019), 2019, : 241 - 246
  • [38] Multi-stream Convolutional Networks for Indoor Scene Recognition
    Anwer, Rao Muhammad
    Khan, Fahad Shahbaz
    Laaksonen, Jorma
    Zaki, Nazar
    COMPUTER ANALYSIS OF IMAGES AND PATTERNS, CAIP 2019, PT I, 2019, 11678 : 196 - 208
  • [39] Speaker verification using attentive multi-scale convolutional recurrent network
    Li, Yanxiong
    Jiang, Zhongjie
    Cao, Wenchang
    Huang, Qisheng
    APPLIED SOFT COMPUTING, 2022, 126
  • [40] Modeling and Simulation of Multi-stream Heat Exchanger Using Artificial Neural Network
    Khan, Mohd Shariq
    Husnil, Yuli Amalia
    Getu, Mesfin
    Lee, Moonyong
    11TH INTERNATIONAL SYMPOSIUM ON PROCESS SYSTEMS ENGINEERING, PTS A AND B, 2012, 31 : 1196 - 1200