Conformer-Based Speaker Recognition Model for Real-Time Multi-Scenarios

被引：0

作者：

Xuan, Xi ^{[1
]}

Han, Runping ^{[2
]}

Gao, Jingxin ^{[1
]}

机构：

[1] School of Arts and Sciences, Beijing Institute of Fashion Technology, Beijing,100029, China

[2] School of Fashion, Beijing Institute of Fashion Technology, Beijing,100029, China

来源：

Computer Engineering and Applications | 2024年 / 60卷 / 07期

关键词：

Real time systems - Speech recognition;

D O I：

暂无

中图分类号：

学科分类号：

摘要：

To handle the problems of poor performances of speaker verification systems, appearing in multiple scenarios with cross-domain utterances, long-duration utterances and noisy utterances, a real-time robust speaker recognition model, PMS-Conformer, is designed based on Conformer in this paper. The architecture of the PMS-Conformer is inspired by the state-of-the-art model named MFA-Conformer. PMS-Conformer has made the improvements on the acoustic feature extractor, network components and loss calculation module of MFA-Conformer respectively, having the novel and effective acoustic feature extractor and the robust speaker embedding extractor with high generalization capability. PMS-Conformer is trained on VoxCeleb1&2 dataset, and it is compared with the baseline MFA-Conformer and ECAPA-TDNN, and extensive comparison experiments are conducted on the speaker verification tasks. The experimental results show that on VoxMovies with cross-domain utterances, SITW with long-duration utterances and VoxCeleb-O processed by adding noise to its utterances, the ASV system built with PMS-Conformer is more competitive than those built with MFA-Conformer and ECAPA-TDNN respectively. Moreover, the trainable Params and RTF of the speaker embedding extractor of PMS-Conformer are significantly lower than those of ECAPA-TDNN. All evaluation experiment results demonstrate that PMS-Conformer exhibits good performances in real-time multi-scenarios. © 2024 Journal of Computer Engineering and Applications Beijing Co., Ltd.; Science Press. All rights reserved.

引用

页码：147 / 156

共 50 条

[31] Real-time multi-agent systems for telerehabilitation scenarios
Calvaresi, Davide
Marinoni, Mauro
Dragoni, Aldo Franco
Hilfiker, Roger
Schumacher, Michael
ARTIFICIAL INTELLIGENCE IN MEDICINE, 2019, 96 (217-231) : 217 - 231
[32] Multi-robot coalition formation in real-time scenarios
Guerrero, Jose
Oliver, Gabriel
ROBOTICS AND AUTONOMOUS SYSTEMS, 2012, 60 (10) : 1295 - 1307
[33] Speaker pruning algorithm for real-time speaker identification
Kinnunen, T
Karpov, E
Fränti, P
AUDIO-BASED AND VIDEO-BASED BIOMETRIC PERSON AUTHENTICATION, PROCEEDINGS, 2003, 2688 : 639 - 646
[34] Real-time Speaker Recognition System using Multi-stream i-vectors for AI Assistant
Cho, Keunseok
Roh, Jaeyoung
Han, Youngho
Kim, Namhoon
Lee, Jaewon
2018 IEEE INTERNATIONAL CONFERENCE ON CONSUMER ELECTRONICS (ICCE), 2018,
[35] Comparison of real-time multi-speaker neural vocoders on CPUs
Matsubara, Keisuke
Okamoto, Takuma
Takashima, Ryoichi
Takiguchi, Tetsuya
Toda, Tomoki
Kawai, Hisashi
ACOUSTICAL SCIENCE AND TECHNOLOGY, 2022, 43 (02) : 121 - 124
[36] Confidence Score Based Speaker Adaptation of Conformer Speech Recognition Systems
Deng, Jiajun
Xie, Xurong
Wang, Tianzi
Cui, Mingyu
Xue, Boyang
Jin, Zengrui
Li, Guinan
Hu, Shujie
Liu, Xunying
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2023, 31 : 1175 - 1190
[37] EAD-CONFORMER: A CONFORMER-BASED ENCODER-ATTENTION-DECODER-NETWORK FOR MULTI-TASK AUDIO SOURCE SEPARATION
Li, Chenxing
Wang, Yang
Deng, Feng
Zhang, Zhuo
Wang, Xiaorui
Wang, Zhongyuan
2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 521 - 525
[38] Real-Time Recognition of Percussive Sounds by a Model-Based Method
Simsekli, Umut
Jylha, Antti
Erkut, Cumhur
Cemgil, Taylan
EURASIP JOURNAL ON ADVANCES IN SIGNAL PROCESSING, 2011,
[39] Real-Time Recognition of Percussive Sounds by a Model-Based Method
Umut Şimşekli
Antti Jylhä
Cumhur Erkut
A. Taylan Cemgil
EURASIP Journal on Advances in Signal Processing, 2011
[40] Speaker Adaptive Real-Time Korean Single Vowel Recognition for an Animation Producing
Whang, Sun-Min
Song, Bok-Hee
Yun, Han-Kyung
FRONTIER AND INNOVATION IN FUTURE COMPUTING AND COMMUNICATIONS, 2014, 301 : 633 - 641

← 1 2 3 4 5 →