Conformer-Based Speaker Recognition Model for Real-Time Multi-Scenarios

被引：0

作者：

Xuan, Xi ^{[1
]}

Han, Runping ^{[2
]}

Gao, Jingxin ^{[1
]}

机构：

[1] School of Arts and Sciences, Beijing Institute of Fashion Technology, Beijing,100029, China

[2] School of Fashion, Beijing Institute of Fashion Technology, Beijing,100029, China

来源：

Computer Engineering and Applications | 2024年 / 60卷 / 07期

关键词：

Real time systems - Speech recognition;

D O I：

暂无

中图分类号：

学科分类号：

摘要：

To handle the problems of poor performances of speaker verification systems, appearing in multiple scenarios with cross-domain utterances, long-duration utterances and noisy utterances, a real-time robust speaker recognition model, PMS-Conformer, is designed based on Conformer in this paper. The architecture of the PMS-Conformer is inspired by the state-of-the-art model named MFA-Conformer. PMS-Conformer has made the improvements on the acoustic feature extractor, network components and loss calculation module of MFA-Conformer respectively, having the novel and effective acoustic feature extractor and the robust speaker embedding extractor with high generalization capability. PMS-Conformer is trained on VoxCeleb1&2 dataset, and it is compared with the baseline MFA-Conformer and ECAPA-TDNN, and extensive comparison experiments are conducted on the speaker verification tasks. The experimental results show that on VoxMovies with cross-domain utterances, SITW with long-duration utterances and VoxCeleb-O processed by adding noise to its utterances, the ASV system built with PMS-Conformer is more competitive than those built with MFA-Conformer and ECAPA-TDNN respectively. Moreover, the trainable Params and RTF of the speaker embedding extractor of PMS-Conformer are significantly lower than those of ECAPA-TDNN. All evaluation experiment results demonstrate that PMS-Conformer exhibits good performances in real-time multi-scenarios. © 2024 Journal of Computer Engineering and Applications Beijing Co., Ltd.; Science Press. All rights reserved.

引用

页码：147 / 156

共 50 条

[41] REAL-TIME, UNIVERSAL, AND ROBUST ADVERSARIAL ATTACKS AGAINST SPEAKER RECOGNITION SYSTEMS
Xie, Yi
Shi, Cong
Lie, Zhuohang
Liu, Jian
Chen, Yingying
Yuan, Bo
2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 1738 - 1742
[42] Differential MFCC and Vector Quantization used for Real-Time Speaker Recognition System
Chen, Wang
Miao Zhenjiang
Xiao, Meng
CISP 2008: FIRST INTERNATIONAL CONGRESS ON IMAGE AND SIGNAL PROCESSING, VOL 5, PROCEEDINGS, 2008, : 319 - 323
[43] Robust endpoint detection and energy normalization for real-time speech and speaker recognition
Li, Q
Zheng, JS
Tsai, A
Zhou, QR
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2002, 10 (03): : 146 - 157
[44] Real-time speaker identification and verification
Kinnunen, T
Karpov, E
Fränti, P
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2006, 14 (01): : 277 - 288
[45] Real-time speaker identification system
Al-Shboul, Bashar
Alsawalqah, Hamad
Lee, Dongman
PROCEEDINGS OF THE 7TH WSEAS INTERNATIONAL CONFERENCE ON APPLIED COMPUTER SCIENCE: COMPUTER SCIENCE CHALLENGES, 2007, : 422 - +
[46] A REAL-TIME SPEAKER DIARIZATION SYSTEM BASED ON SPATIAL SPECTRUM
Zheng, Siqi
Huang, Weilong
Wang, Xianliang
Suo, Hongbin
Feng, Jinwei
Yan, Zhijie
2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 7208 - 7212
[47] Real-Time Human Intention Recognition of Multi-Joints Based on MYO
Sun, Lei
An, Honglei
Ma, Hongxu
Gao, Jialong
IEEE ACCESS, 2020, 8 : 4235 - 4243
[48] A MIMD-based multi threaded real-time processor for pattern recognition
Lesser, F
de Cuveland, J
Lindenstruth, V
Reichling, C
Schneider, R
Schulz, MW
EUROMICRO SYMPOSIUM ON DIGITAL SYSTEMS DESIGN, PROCEEDINGS, 2001, : 372 - 375
[49] Real-Time Object Recognition Based on Cortical Multi-scale Keypoints
Terzic, Kasim
Rodrigues, Joao M. F.
Hans du Buf, J. M.
PATTERN RECOGNITION AND IMAGE ANALYSIS, IBPRIA 2013, 2013, 7887 : 314 - 321
[50] Multi-lane architecture for eigenface based real-time face recognition
Gottumukkal, Rajkiran
Ngo, Hau T.
Asari, Vijayan K.
MICROPROCESSORS AND MICROSYSTEMS, 2006, 30 (04) : 216 - 224

← 1 2 3 4 5 →