Fast and Lightweight Voice Replay Attack Detection via Time-Frequency Spectrum Difference

被引：0

作者：

He, Ruiwen ^{[1
]}

Cheng, Yushi ^{[2
]}

Zheng, Zhicong ^{[1
]}

Ji, Xiaoyu ^{[1
]}

Xu, Wenyuan ^{[1
]}

机构：

[1] Zhejiang Univ, Coll Elect Engn, Ubiquitous Syst Secur Lab, Hangzhou 310027, Peoples R China

[2] Zhejiang Univ, ZJU UIUC Inst, Ubiquitous Syst Secur Lab, Hangzhou 310027, Peoples R China

来源：

IEEE INTERNET OF THINGS JOURNAL | 2024年 / 11卷 / 18期

基金：

中国国家自然科学基金;

关键词：

Acoustic feature; defense; replay attack; security measurement; statistical analysis;

D O I：

10.1109/JIOT.2024.3406962

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Due to the open nature of voice and voice interface, an adversary can spoof voice recognition systems by replaying prerecorded voice commands from legitimate users, known as the voice replay attack. Existing detection methods against voice replay attacks mainly rely on extra hardware to determine the sound source or require excessive computing resources to train a classifier with abundant acoustic features. In this article, we propose Anti-Replay, a fast and lightweight detection system for voice replay attacks. To overcome the challenge of redundant classification features and complex calculation, we first investigate the time-frequency spectrum difference between the genuine human voice and the replayed audio caused by the nonlinear distortion of the attacker's microphones and speakers. Then, we design 5 types with a total of 77 features in both the time and frequency domains and propose a convolutional neural network classifier SE-ResNet50 for attack detection. Evaluations against the data sets of ASVspoof2017, ASVspoof2019, and ASVspoof2021 demonstrate that Anti-Replay can achieve an average equal error rate (EER) of 1.36% across three data sets. Meanwhile, Anti-Replay decreases the training time by 52.3% and 90.2% and decreases the model size by 83.5% and 99.9% compared with the baseline model constant-Q cepstral coefficient-Gaussian mixture model and the state-of-the-art method Res2Net. We have also confirmed that our system is effective in detecting the adaptive replay attack.

引用

页码：29798 / 29810

页数：13

共 50 条

[41] Hyperbolic kernel for time-frequency power spectrum
Le, KN
Dabke, KP
Egan, GK
OPTICAL ENGINEERING, 2003, 42 (08) : 2400 - 2415
[42] Time-frequency domain fast audio transcoding
Ju, Fu-Shing
Fang, Ce-Min
ISM 2006: EIGHTH IEEE INTERNATIONAL SYMPOSIUM ON MULTIMEDIA, PROCEEDINGS, 2006, : 750 - 753
[43] A fast algorithm for adapted time-frequency tilings
Thiele, CM
Villemoes, LF
APPLIED AND COMPUTATIONAL HARMONIC ANALYSIS, 1996, 3 (02) : 91 - 99
[44] Fast computing of bilinear time-frequency transformation
Fan Yongsheng
Yu Hongying
ISTM/2007: 7TH INTERNATIONAL SYMPOSIUM ON TEST AND MEASUREMENT, VOLS 1-7, CONFERENCE PROCEEDINGS, 2007, : 1653 - 1656
[45] Adaptive multitaper time-frequency spectrum estimation
Pitton, J
ADVANCED SIGNAL PROCESSING ALGORITHMS, ARCHITECTURES,AND IMPLEMENTATIONS IX, 1999, 3807 : 458 - 468
[46] Nonstationary spectrum estimation and time-frequency concentration
Pitton, JW
PROCEEDINGS OF THE 1998 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-6, 1998, : 2425 - 2428
[47] Physical watermarking for replay attack detection in continuous-time systems
Yaghooti, Bahram
Romagnoli, Raffaele
Sinopoli, Bruno
EUROPEAN JOURNAL OF CONTROL, 2021, 62 : 57 - 62
[48] Physical watermarking for replay attack detection in continuous-time systems
Yaghooti, Bahram
Romagnoli, Raffaele
Sinopoli, Bruno
European Journal of Control, 2021, 62 : 57 - 62
[49] Multiridge detection and time-frequency reconstruction
Carmona, RA
Hwang, WL
Torrésani, B
IEEE TRANSACTIONS ON SIGNAL PROCESSING, 1999, 47 (02) : 480 - 492
[50] A time-frequency approach for spike detection
Hassanpour, H
Mesbah, M
Boashash, B
ICECS 2003: PROCEEDINGS OF THE 2003 10TH IEEE INTERNATIONAL CONFERENCE ON ELECTRONICS, CIRCUITS AND SYSTEMS, VOLS 1-3, 2003, : 56 - 59

← 1 2 3 4 5 →