Conformer-Based Speaker Recognition Model for Real-Time Multi-Scenarios

被引:0
|
作者
Xuan, Xi [1 ]
Han, Runping [2 ]
Gao, Jingxin [1 ]
机构
[1] School of Arts and Sciences, Beijing Institute of Fashion Technology, Beijing,100029, China
[2] School of Fashion, Beijing Institute of Fashion Technology, Beijing,100029, China
关键词
Real time systems - Speech recognition;
D O I
暂无
中图分类号
学科分类号
摘要
To handle the problems of poor performances of speaker verification systems, appearing in multiple scenarios with cross-domain utterances, long-duration utterances and noisy utterances, a real-time robust speaker recognition model, PMS-Conformer, is designed based on Conformer in this paper. The architecture of the PMS-Conformer is inspired by the state-of-the-art model named MFA-Conformer. PMS-Conformer has made the improvements on the acoustic feature extractor, network components and loss calculation module of MFA-Conformer respectively, having the novel and effective acoustic feature extractor and the robust speaker embedding extractor with high generalization capability. PMS-Conformer is trained on VoxCeleb1&2 dataset, and it is compared with the baseline MFA-Conformer and ECAPA-TDNN, and extensive comparison experiments are conducted on the speaker verification tasks. The experimental results show that on VoxMovies with cross-domain utterances, SITW with long-duration utterances and VoxCeleb-O processed by adding noise to its utterances, the ASV system built with PMS-Conformer is more competitive than those built with MFA-Conformer and ECAPA-TDNN respectively. Moreover, the trainable Params and RTF of the speaker embedding extractor of PMS-Conformer are significantly lower than those of ECAPA-TDNN. All evaluation experiment results demonstrate that PMS-Conformer exhibits good performances in real-time multi-scenarios. © 2024 Journal of Computer Engineering and Applications Beijing Co., Ltd.; Science Press. All rights reserved.
引用
收藏
页码:147 / 156
相关论文
共 50 条
  • [41] REAL-TIME, UNIVERSAL, AND ROBUST ADVERSARIAL ATTACKS AGAINST SPEAKER RECOGNITION SYSTEMS
    Xie, Yi
    Shi, Cong
    Lie, Zhuohang
    Liu, Jian
    Chen, Yingying
    Yuan, Bo
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 1738 - 1742
  • [42] Differential MFCC and Vector Quantization used for Real-Time Speaker Recognition System
    Chen, Wang
    Miao Zhenjiang
    Xiao, Meng
    CISP 2008: FIRST INTERNATIONAL CONGRESS ON IMAGE AND SIGNAL PROCESSING, VOL 5, PROCEEDINGS, 2008, : 319 - 323
  • [43] Robust endpoint detection and energy normalization for real-time speech and speaker recognition
    Li, Q
    Zheng, JS
    Tsai, A
    Zhou, QR
    IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2002, 10 (03): : 146 - 157
  • [44] Real-time speaker identification and verification
    Kinnunen, T
    Karpov, E
    Fränti, P
    IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2006, 14 (01): : 277 - 288
  • [45] Real-time speaker identification system
    Al-Shboul, Bashar
    Alsawalqah, Hamad
    Lee, Dongman
    PROCEEDINGS OF THE 7TH WSEAS INTERNATIONAL CONFERENCE ON APPLIED COMPUTER SCIENCE: COMPUTER SCIENCE CHALLENGES, 2007, : 422 - +
  • [46] A REAL-TIME SPEAKER DIARIZATION SYSTEM BASED ON SPATIAL SPECTRUM
    Zheng, Siqi
    Huang, Weilong
    Wang, Xianliang
    Suo, Hongbin
    Feng, Jinwei
    Yan, Zhijie
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 7208 - 7212
  • [47] Real-Time Human Intention Recognition of Multi-Joints Based on MYO
    Sun, Lei
    An, Honglei
    Ma, Hongxu
    Gao, Jialong
    IEEE ACCESS, 2020, 8 : 4235 - 4243
  • [48] A MIMD-based multi threaded real-time processor for pattern recognition
    Lesser, F
    de Cuveland, J
    Lindenstruth, V
    Reichling, C
    Schneider, R
    Schulz, MW
    EUROMICRO SYMPOSIUM ON DIGITAL SYSTEMS DESIGN, PROCEEDINGS, 2001, : 372 - 375
  • [49] Real-Time Object Recognition Based on Cortical Multi-scale Keypoints
    Terzic, Kasim
    Rodrigues, Joao M. F.
    Hans du Buf, J. M.
    PATTERN RECOGNITION AND IMAGE ANALYSIS, IBPRIA 2013, 2013, 7887 : 314 - 321
  • [50] Multi-lane architecture for eigenface based real-time face recognition
    Gottumukkal, Rajkiran
    Ngo, Hau T.
    Asari, Vijayan K.
    MICROPROCESSORS AND MICROSYSTEMS, 2006, 30 (04) : 216 - 224