SI-NET: MULTI-SCALE CONTEXT-AWARE CONVOLUTIONAL BLOCK FOR SPEAKER VERIFICATION

被引:6
|
作者
Li, Zhuo [1 ,2 ]
Fang, Ce [1 ,2 ]
Xiao, Runqiu [1 ,2 ]
Wang, Wenchao [1 ,2 ]
Yan, Yonghong [1 ,2 ,3 ]
机构
[1] Chinese Acad Sci, Inst Acoust, Key Lab Speech Acoust & Content Understanding, Beijing, Peoples R China
[2] Univ Chinese Acad Sci, Beijing, Peoples R China
[3] Chinese Acad Sci, Xinjiang Tech Inst Phys & Chem, Xinjiang Key Lab Minor Speech & Language Informat, Urumqi, Peoples R China
关键词
speaker verification; Split-Integration; multi-scale features; dynamic integration; at a granular level;
D O I
10.1109/ASRU51503.2021.9688119
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Utilizing multi-scale information adequately is essential for building a high-performance speaker verification (SV) system. Biological research shows that the human auditory system employs a multi-timescale processing mode to extract information and has a mechanism of integrating multi-scale information to encode sound information. Inspired by this, we propose a novel block, named Split-Integration (SI), to explore multi-scale context-aware feature learning at a granular level for speaker verification. Our model involves a pair of operations, (i) multi-scale split, which is designed to imitate the multi-timescale processing mode, extracting multi-scale features by grouping and stacking different sizes of filters, and (ii) dynamic integration, which aims at reflecting analogy with the fusion mechanism, introducing KL divergence to measure the complementarily between multi-scale features such that the model fully integrates multi-scale features and produces better speaker-discriminative representation. Experiments are conducted on Voxceleb and Speakers in the Wild(SITW) datasets. Results demonstrate that our approach achieves a relative 10%-20% improvement on equal error rate (EER) over a strong baseline in the SV task.
引用
收藏
页码:220 / 227
页数:8
相关论文
共 50 条
  • [1] Multi-Scale Based Context-Aware Net for Action Detection
    Liu, Haijun
    Wang, Shiguang
    Wang, Wen
    Cheng, Jian
    [J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2020, 22 (02) : 337 - 348
  • [2] CDMC-Net: Context-Aware Image Deblurring Using a Multi-scale Cascaded Network
    Zhao, Qian
    Zhou, Dongming
    Yang, Hao
    [J]. NEURAL PROCESSING LETTERS, 2023, 55 (04) : 3985 - 4006
  • [3] Speaker verification using attentive multi-scale convolutional recurrent network
    Li, Yanxiong
    Jiang, Zhongjie
    Cao, Wenchang
    Huang, Qisheng
    [J]. APPLIED SOFT COMPUTING, 2022, 126
  • [4] CDMC-Net: Context-Aware Image Deblurring Using a Multi-scale Cascaded Network
    Qian Zhao
    Dongming Zhou
    Hao Yang
    [J]. Neural Processing Letters, 2023, 55 : 3985 - 4006
  • [5] CAM: CONTEXT-AWARE MASKING FOR ROBUST SPEAKER VERIFICATION
    Yu, Ya-Qi
    Zheng, Siqi
    Suo, Hongbin
    Lei, Yun
    Li, Wu-Jun
    [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 6703 - 6707
  • [6] Bridging Multi-Scale Context-Aware Representation for Object Detection
    Wang, Boying
    Ji, Ruyi
    Zhang, Libo
    Wu, Yanjun
    [J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2023, 33 (05) : 2317 - 2329
  • [7] Multi-scale Fusion with Context-aware Network for Object Detection
    Wang, Hanyuan
    Xu, Jie
    Li, Linke
    Tian, Ye
    Xu, Du
    Xu, Shizhong
    [J]. 2018 24TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2018, : 2486 - 2491
  • [8] Multi-scale context-aware network for continuous sign language recognition
    Senhua XUE
    Liqing GAO
    Liang WAN
    Wei FENG
    [J]. 虚拟现实与智能硬件(中英文)., 2024, 6 (04) - 337
  • [9] Context-Aware Multi-Scale Aggregation Network for Congested Crowd Counting
    Huang, Liangjun
    Shen, Shihui
    Zhu, Luning
    Shi, Qingxuan
    Zhang, Jianwei
    [J]. SENSORS, 2022, 22 (09)
  • [10] Multi-scale inputs and context-aware aggregation network for stereo matching
    Shi, Liqing
    Xiong, Taiping
    Cui, Gengshen
    Pan, Minghua
    Cheng, Nuo
    Wu, Xiangjie
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2024, 83 (30) : 75171 - 75194