Conformer-Based Speaker Recognition Model for Real-Time Multi-Scenarios

被引:0
|
作者
Xuan, Xi [1 ]
Han, Runping [2 ]
Gao, Jingxin [1 ]
机构
[1] School of Arts and Sciences, Beijing Institute of Fashion Technology, Beijing,100029, China
[2] School of Fashion, Beijing Institute of Fashion Technology, Beijing,100029, China
关键词
Real time systems - Speech recognition;
D O I
暂无
中图分类号
学科分类号
摘要
To handle the problems of poor performances of speaker verification systems, appearing in multiple scenarios with cross-domain utterances, long-duration utterances and noisy utterances, a real-time robust speaker recognition model, PMS-Conformer, is designed based on Conformer in this paper. The architecture of the PMS-Conformer is inspired by the state-of-the-art model named MFA-Conformer. PMS-Conformer has made the improvements on the acoustic feature extractor, network components and loss calculation module of MFA-Conformer respectively, having the novel and effective acoustic feature extractor and the robust speaker embedding extractor with high generalization capability. PMS-Conformer is trained on VoxCeleb1&2 dataset, and it is compared with the baseline MFA-Conformer and ECAPA-TDNN, and extensive comparison experiments are conducted on the speaker verification tasks. The experimental results show that on VoxMovies with cross-domain utterances, SITW with long-duration utterances and VoxCeleb-O processed by adding noise to its utterances, the ASV system built with PMS-Conformer is more competitive than those built with MFA-Conformer and ECAPA-TDNN respectively. Moreover, the trainable Params and RTF of the speaker embedding extractor of PMS-Conformer are significantly lower than those of ECAPA-TDNN. All evaluation experiment results demonstrate that PMS-Conformer exhibits good performances in real-time multi-scenarios. © 2024 Journal of Computer Engineering and Applications Beijing Co., Ltd.; Science Press. All rights reserved.
引用
收藏
页码:147 / 156
相关论文
共 50 条
  • [21] CDPNet: conformer-based dual path joint modeling network for bird sound recognition
    Guo, Huimin
    Jian, Haifang
    Wang, Yiyu
    Wang, Hongchang
    Cheng, Qinghua
    Zheng, Shuaikang
    Li, Yuehao
    APPLIED INTELLIGENCE, 2024, 54 (04) : 3152 - 3168
  • [22] CDPNet: conformer-based dual path joint modeling network for bird sound recognition
    Huimin Guo
    Haifang Jian
    Yiyu Wang
    Hongchang Wang
    Shuaikang Zheng
    Qinghua Cheng
    Yuehao Li
    Applied Intelligence, 2024, 54 : 3152 - 3168
  • [23] Confidence Score Based Conformer Speaker Adaptation for Speech Recognition
    Deng, Jiajun
    Xie, Xurong
    Wang, Tianzi
    Cui, Mingyu
    Xue, Boyang
    Jin, Zengrui
    Geng, Mengzhe
    Li, Guinan
    Liu, Xunying
    Meng, Helen
    INTERSPEECH 2022, 2022, : 2623 - 2627
  • [24] Real-Time Speaker Independent Isolated Word Recognition on Banana Pi
    Disken, Gokay
    Saribulut, Lutfu
    Tufekci, Zekeriya
    Cevik, Ulus
    PROCEEDINGS OF THE 2018 10TH INTERNATIONAL CONFERENCE ON ELECTRONICS, COMPUTERS AND ARTIFICIAL INTELLIGENCE (ECAI), 2018,
  • [25] Universal and accent-discriminative encoders for conformer-based accent-invariant speech recognition
    Wang X.
    Long Y.
    Xu D.
    International Journal of Speech Technology, 2022, 25 (4) : 987 - 995
  • [26] A Near Real-Time Automatic Speaker Recognition Architecture for Voice-Based User Interface
    Dhakal, Parashar
    Damacharla, Praveen
    Javaid, Ahmad Y.
    Devabhaktuni, Vijay
    MACHINE LEARNING AND KNOWLEDGE EXTRACTION, 2019, 1 (01): : 504 - 520
  • [27] An efficient model for real-time wildfire detection in complex scenarios based on multi-head attention mechanism
    Wang, Xiaotian
    Pan, Zhongjie
    Gao, Hang
    He, Ningxin
    Gao, Tiegang
    JOURNAL OF REAL-TIME IMAGE PROCESSING, 2023, 20 (04)
  • [28] An efficient model for real-time wildfire detection in complex scenarios based on multi-head attention mechanism
    Xiaotian Wang
    Zhongjie Pan
    Hang Gao
    Ningxin He
    Tiegang Gao
    Journal of Real-Time Image Processing, 2023, 20
  • [29] Methods for building ATRU real-time model based on different application scenarios
    Sun, Haocheng
    Kang, Yuanli
    Hui, Yannian
    Wang, Yue
    JOURNAL OF ENGINEERING-JOE, 2018, (13): : 495 - 498
  • [30] An Improvement to Conformer-Based Model for High-Accuracy Speech Feature Extraction and Learning
    Liu, Mengzhuo
    Wei, Yangjie
    ENTROPY, 2022, 24 (07)