Fast and Lightweight Voice Replay Attack Detection via Time-Frequency Spectrum Difference

被引:0
|
作者
He, Ruiwen [1 ]
Cheng, Yushi [2 ]
Zheng, Zhicong [1 ]
Ji, Xiaoyu [1 ]
Xu, Wenyuan [1 ]
机构
[1] Zhejiang Univ, Coll Elect Engn, Ubiquitous Syst Secur Lab, Hangzhou 310027, Peoples R China
[2] Zhejiang Univ, ZJU UIUC Inst, Ubiquitous Syst Secur Lab, Hangzhou 310027, Peoples R China
来源
IEEE INTERNET OF THINGS JOURNAL | 2024年 / 11卷 / 18期
基金
中国国家自然科学基金;
关键词
Acoustic feature; defense; replay attack; security measurement; statistical analysis;
D O I
10.1109/JIOT.2024.3406962
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Due to the open nature of voice and voice interface, an adversary can spoof voice recognition systems by replaying prerecorded voice commands from legitimate users, known as the voice replay attack. Existing detection methods against voice replay attacks mainly rely on extra hardware to determine the sound source or require excessive computing resources to train a classifier with abundant acoustic features. In this article, we propose Anti-Replay, a fast and lightweight detection system for voice replay attacks. To overcome the challenge of redundant classification features and complex calculation, we first investigate the time-frequency spectrum difference between the genuine human voice and the replayed audio caused by the nonlinear distortion of the attacker's microphones and speakers. Then, we design 5 types with a total of 77 features in both the time and frequency domains and propose a convolutional neural network classifier SE-ResNet50 for attack detection. Evaluations against the data sets of ASVspoof2017, ASVspoof2019, and ASVspoof2021 demonstrate that Anti-Replay can achieve an average equal error rate (EER) of 1.36% across three data sets. Meanwhile, Anti-Replay decreases the training time by 52.3% and 90.2% and decreases the model size by 83.5% and 99.9% compared with the baseline model constant-Q cepstral coefficient-Gaussian mixture model and the state-of-the-art method Res2Net. We have also confirmed that our system is effective in detecting the adaptive replay attack.
引用
收藏
页码:29798 / 29810
页数:13
相关论文
共 50 条
  • [41] Hyperbolic kernel for time-frequency power spectrum
    Le, KN
    Dabke, KP
    Egan, GK
    OPTICAL ENGINEERING, 2003, 42 (08) : 2400 - 2415
  • [42] Time-frequency domain fast audio transcoding
    Ju, Fu-Shing
    Fang, Ce-Min
    ISM 2006: EIGHTH IEEE INTERNATIONAL SYMPOSIUM ON MULTIMEDIA, PROCEEDINGS, 2006, : 750 - 753
  • [43] A fast algorithm for adapted time-frequency tilings
    Thiele, CM
    Villemoes, LF
    APPLIED AND COMPUTATIONAL HARMONIC ANALYSIS, 1996, 3 (02) : 91 - 99
  • [44] Fast computing of bilinear time-frequency transformation
    Fan Yongsheng
    Yu Hongying
    ISTM/2007: 7TH INTERNATIONAL SYMPOSIUM ON TEST AND MEASUREMENT, VOLS 1-7, CONFERENCE PROCEEDINGS, 2007, : 1653 - 1656
  • [45] Adaptive multitaper time-frequency spectrum estimation
    Pitton, J
    ADVANCED SIGNAL PROCESSING ALGORITHMS, ARCHITECTURES,AND IMPLEMENTATIONS IX, 1999, 3807 : 458 - 468
  • [46] Nonstationary spectrum estimation and time-frequency concentration
    Pitton, JW
    PROCEEDINGS OF THE 1998 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-6, 1998, : 2425 - 2428
  • [47] Physical watermarking for replay attack detection in continuous-time systems
    Yaghooti, Bahram
    Romagnoli, Raffaele
    Sinopoli, Bruno
    EUROPEAN JOURNAL OF CONTROL, 2021, 62 : 57 - 62
  • [48] Physical watermarking for replay attack detection in continuous-time systems
    Yaghooti, Bahram
    Romagnoli, Raffaele
    Sinopoli, Bruno
    European Journal of Control, 2021, 62 : 57 - 62
  • [49] Multiridge detection and time-frequency reconstruction
    Carmona, RA
    Hwang, WL
    Torrésani, B
    IEEE TRANSACTIONS ON SIGNAL PROCESSING, 1999, 47 (02) : 480 - 492
  • [50] A time-frequency approach for spike detection
    Hassanpour, H
    Mesbah, M
    Boashash, B
    ICECS 2003: PROCEEDINGS OF THE 2003 10TH IEEE INTERNATIONAL CONFERENCE ON ELECTRONICS, CIRCUITS AND SYSTEMS, VOLS 1-3, 2003, : 56 - 59