Deep generative variational autoencoding for replay spoof detection in automatic speaker verification

被引:15
|
作者
Chettri, Bhusan [1 ,2 ]
Kinnunen, Tomi [1 ]
Benetos, Emmanouil [2 ]
机构
[1] Univ Eastern Finland, Sch Comp, FI-80101 Joensuu, Finland
[2] Queen Mary Univ London, Sch EECS, London, England
来源
COMPUTER SPEECH AND LANGUAGE | 2020年 / 63卷
基金
芬兰科学院;
关键词
Anti-spoofing; Presentation attack detection; Replay attack; Countermeasures; Deep generative models; REPRESENTATIONS; CLASSIFICATION;
D O I
10.1016/j.csl.2020.101092
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Automatic speaker verification (ASV) systems are highly vulnerable to presentation attacks, also called spoofing attacks. Replay is among the simplest attacks to mount - yet difficult to detect reliably. The generalization failure of spoofing countermeasures (CMs) has driven the community to study various alternative deep learning CMs. The majority of them are supervised approaches that learn a human-spoof discriminator. In this paper, we advocate a different, deep generative approach that leverages from powerful unsupervised manifold learning in classification. The potential benefits include the possibility to sample new data, and to obtain insights to the latent features of genuine and spoofed speech. To this end, we propose to use variational autoencoders (VAEs) as an alternative backend for replay attack detection, via three alternative models that differ in their class-conditioning. The first one, similar to the use of Gaussian mixture models (GMMs) in spoof detection, is to train independently two VAEs - one for each class. The second one is to train a single conditional model (C-VAE) by injecting a one-hot class label vector to the encoder and decoder networks. Our final proposal integrates an auxiliary classifier to guide the learning of the latent space. Our experimental results using constant-Q cepstral coefficient (CQCC) features on the ASVspoof 2017 and 2019 physical access subtask datasets indicate that the C-VAE offers substantial improvement in comparison to training two separate VAEs for each class. On the 2019 dataset, the C-VAE outperforms the VAE and the baseline GMM by an absolute 9 - 10% in both equal error rate (EER) and tandem detection cost function (t-DCF) metrics. Finally, we propose VAE residuals - the absolute difference of the original input and the reconstruction as features for spoofing detection. The proposed frontend approach augmented with a convolutional neural network classifier demonstrated substantial improvement over the VAE backend use case. (c) 2020 Elsevier Ltd. All rights reserved.
引用
收藏
页数:18
相关论文
共 50 条
  • [1] Optimized deep network based spoof detection in automatic speaker verification system
    Neelima, Medikonda
    Prabha, I. Santi
    MULTIMEDIA TOOLS AND APPLICATIONS, 2024, 83 (05) : 13073 - 13091
  • [2] Optimized deep network based spoof detection in automatic speaker verification system
    Medikonda Neelima
    I. Santi Prabha
    Multimedia Tools and Applications, 2024, 83 : 13073 - 13091
  • [3] Automatic speaker verification systems and spoof detection techniques: review and analysis
    Mittal, Aakshi
    Dua, Mohit
    INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY, 2022, 25 (01) : 105 - 134
  • [4] Automatic speaker verification systems and spoof detection techniques: review and analysis
    Aakshi Mittal
    Mohit Dua
    International Journal of Speech Technology, 2022, 25 : 105 - 134
  • [5] A Survey on Replay Attack Detection for Automatic Speaker Verification (ASV) System
    Patil, Hemant A.
    Kamble, Madhu R.
    2018 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2018, : 1047 - 1053
  • [6] Replay spoof detection for speaker verification system using magnitude-phase-instantaneous frequency and energy features
    Bharath, K. P.
    Kumar, M. Rajesh
    MULTIMEDIA TOOLS AND APPLICATIONS, 2022, 81 (27) : 39343 - 39366
  • [7] Replay spoof detection for speaker verification system using magnitude-phase-instantaneous frequency and energy features
    K. P. Bharath
    M. Rajesh Kumar
    Multimedia Tools and Applications, 2022, 81 : 39343 - 39366
  • [8] A New Replay Attack Against Automatic Speaker Verification Systems
    Yoon, Sung-Hyun
    Koh, Min-Sung
    Park, Jae-Han
    Yu, Ha-Jin
    IEEE ACCESS, 2020, 8 : 36080 - 36088
  • [9] An assessment of automatic speaker verification vulnerabilities to replay spoofing attacks
    Janicki, Artur
    Alegre, Federico
    Evans, Nicholas
    SECURITY AND COMMUNICATION NETWORKS, 2016, 9 (15) : 3030 - 3044
  • [10] An approach to detect replay attack in automatic speaker verification system
    Saranya, S.
    Bharathi, B.
    Kavitha, S.
    2018 2ND INTERNATIONAL CONFERENCE ON COMPUTER, COMMUNICATION, AND SIGNAL PROCESSING (ICCCSP): SPECIAL FOCUS ON TECHNOLOGY AND INNOVATION FOR SMART ENVIRONMENT, 2018, : 91 - 95