Conformer-Based Speaker Recognition Model for Real-Time Multi-Scenarios

被引：0

作者：

Xuan, Xi ^{[1
]}

Han, Runping ^{[2
]}

Gao, Jingxin ^{[1
]}

机构：

[1] School of Arts and Sciences, Beijing Institute of Fashion Technology, Beijing,100029, China

[2] School of Fashion, Beijing Institute of Fashion Technology, Beijing,100029, China

来源：

Computer Engineering and Applications | 2024年 / 60卷 / 07期

关键词：

Real time systems - Speech recognition;

D O I：

暂无

中图分类号：

学科分类号：

摘要：

To handle the problems of poor performances of speaker verification systems, appearing in multiple scenarios with cross-domain utterances, long-duration utterances and noisy utterances, a real-time robust speaker recognition model, PMS-Conformer, is designed based on Conformer in this paper. The architecture of the PMS-Conformer is inspired by the state-of-the-art model named MFA-Conformer. PMS-Conformer has made the improvements on the acoustic feature extractor, network components and loss calculation module of MFA-Conformer respectively, having the novel and effective acoustic feature extractor and the robust speaker embedding extractor with high generalization capability. PMS-Conformer is trained on VoxCeleb1&2 dataset, and it is compared with the baseline MFA-Conformer and ECAPA-TDNN, and extensive comparison experiments are conducted on the speaker verification tasks. The experimental results show that on VoxMovies with cross-domain utterances, SITW with long-duration utterances and VoxCeleb-O processed by adding noise to its utterances, the ASV system built with PMS-Conformer is more competitive than those built with MFA-Conformer and ECAPA-TDNN respectively. Moreover, the trainable Params and RTF of the speaker embedding extractor of PMS-Conformer are significantly lower than those of ECAPA-TDNN. All evaluation experiment results demonstrate that PMS-Conformer exhibits good performances in real-time multi-scenarios. © 2024 Journal of Computer Engineering and Applications Beijing Co., Ltd.; Science Press. All rights reserved.

引用

页码：147 / 156

共 50 条

[21] CDPNet: conformer-based dual path joint modeling network for bird sound recognition
Guo, Huimin
Jian, Haifang
Wang, Yiyu
Wang, Hongchang
Cheng, Qinghua
Zheng, Shuaikang
Li, Yuehao
APPLIED INTELLIGENCE, 2024, 54 (04) : 3152 - 3168
[22] CDPNet: conformer-based dual path joint modeling network for bird sound recognition
Huimin Guo
Haifang Jian
Yiyu Wang
Hongchang Wang
Shuaikang Zheng
Qinghua Cheng
Yuehao Li
Applied Intelligence, 2024, 54 : 3152 - 3168
[23] Confidence Score Based Conformer Speaker Adaptation for Speech Recognition
Deng, Jiajun
Xie, Xurong
Wang, Tianzi
Cui, Mingyu
Xue, Boyang
Jin, Zengrui
Geng, Mengzhe
Li, Guinan
Liu, Xunying
Meng, Helen
INTERSPEECH 2022, 2022, : 2623 - 2627
[24] Real-Time Speaker Independent Isolated Word Recognition on Banana Pi
Disken, Gokay
Saribulut, Lutfu
Tufekci, Zekeriya
Cevik, Ulus
PROCEEDINGS OF THE 2018 10TH INTERNATIONAL CONFERENCE ON ELECTRONICS, COMPUTERS AND ARTIFICIAL INTELLIGENCE (ECAI), 2018,
[25] Universal and accent-discriminative encoders for conformer-based accent-invariant speech recognition
Wang X.
Long Y.
Xu D.
International Journal of Speech Technology, 2022, 25 (4) : 987 - 995
[26] A Near Real-Time Automatic Speaker Recognition Architecture for Voice-Based User Interface
Dhakal, Parashar
Damacharla, Praveen
Javaid, Ahmad Y.
Devabhaktuni, Vijay
MACHINE LEARNING AND KNOWLEDGE EXTRACTION, 2019, 1 (01): : 504 - 520
[27] An efficient model for real-time wildfire detection in complex scenarios based on multi-head attention mechanism
Wang, Xiaotian
Pan, Zhongjie
Gao, Hang
He, Ningxin
Gao, Tiegang
JOURNAL OF REAL-TIME IMAGE PROCESSING, 2023, 20 (04)
[28] An efficient model for real-time wildfire detection in complex scenarios based on multi-head attention mechanism
Xiaotian Wang
Zhongjie Pan
Hang Gao
Ningxin He
Tiegang Gao
Journal of Real-Time Image Processing, 2023, 20
[29] Methods for building ATRU real-time model based on different application scenarios
Sun, Haocheng
Kang, Yuanli
Hui, Yannian
Wang, Yue
JOURNAL OF ENGINEERING-JOE, 2018, (13): : 495 - 498
[30] An Improvement to Conformer-Based Model for High-Accuracy Speech Feature Extraction and Learning
Liu, Mengzhuo
Wei, Yangjie
ENTROPY, 2022, 24 (07)

← 1 2 3 4 5 →