FOOLING END-TO-END SPEAKER VERIFICATION WITH ADVERSARIAL EXAMPLES

被引:0
|
作者
Kreuk, Felix [1 ]
Adi, Yossi [1 ]
Cisse, Moustapha [2 ]
Keshet, Joseph [1 ]
机构
[1] Bar Ilan Univ, Ramat Gan, Israel
[2] Facebook AI Res, Paris, France
关键词
Automatic speaker verification; adversarial examples;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Automatic speaker verification systems are increasingly used as the primary means to authenticate costumers. Recently, it has been proposed to train speaker verification systems using end-to-end deep neural models. In this paper, we show that such systems are vulnerable to adversarial example attacks. Adversarial examples are generated by adding a peculiar noise to original speaker examples, in such a way that they are almost indistinguishable, by a human listener. Yet, the generated waveforms, which sound as speaker A can be used to fool such a system by claiming as if the waveforms were uttered by speaker B. We present white-box attacks on a deep end-to-end network that was either trained on YOHO or NTIMIT. We also present two black-box attacks. In the first one, we generate adversarial examples with a system trained on NTIMIT and perform the attack on a system that trained on YOHO. In the second one, we generate the adversarial examples with a system trained using Mel-spectrum features and perform the attack on a system trained using MFCCs. Our results show that one can significantly decrease the accuracy of a target system even when the adversarial examples are generated with different system potentially using different features.
引用
收藏
页码:1962 / 1966
页数:5
相关论文
共 50 条
  • [1] Adversarial Regularization for End-to-end Robust Speaker Verification
    Wang, Qing
    Guo, Pengcheng
    Sun, Sining
    Xie, Lei
    Hansen, John H. L.
    [J]. INTERSPEECH 2019, 2019, : 4010 - 4014
  • [2] SPEAKER VERIFICATION USING END-TO-END ADVERSARIAL LANGUAGE ADAPTATION
    Rohdin, Johan
    Stafylakis, Themos
    Silnova, Anna
    Zeinali, Hossein
    Burget, Lukas
    Plchot, Oldrich
    [J]. 2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 6006 - 6010
  • [3] GENERATIVE ADVERSARIAL SPEAKER EMBEDDING NETWORKS FOR DOMAIN ROBUST END-TO-END SPEAKER VERIFICATION
    Bhattacharya, Gautam
    Monteiro, Joao
    Alam, Jahangir
    Kenny, Patrick
    [J]. 2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 6226 - 6230
  • [4] An End-to-End Text-independent Speaker Verification Framework with a Keyword Adversarial Network
    Yun, Sungrack
    Cho, Janghoon
    Eum, Jungyun
    Chang, Wonil
    Hwang, Kyuwoong
    [J]. INTERSPEECH 2019, 2019, : 2923 - 2927
  • [5] GENERALIZED END-TO-END LOSS FOR SPEAKER VERIFICATION
    Wan, Li
    Wang, Quan
    Papir, Alan
    Moreno, Ignacio Lopez
    [J]. 2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 4879 - 4883
  • [6] ADAPTING END-TO-END NEURAL SPEAKER VERIFICATION TO NEW LANGUAGES AND RECORDING CONDITIONS WITH ADVERSARIAL TRAINING
    Bhattacharya, Gautam
    Alam, Jahangir
    Kenny, Patrick
    [J]. 2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 6041 - 6045
  • [7] Effective Phase Encoding for End-to-end Speaker Verification
    Peng, Junyi
    Qu, Xiaoyang
    Gu, Rongzhi
    Wang, Jianzong
    Xiao, Jing
    Burget, Lukas
    Cernocky, Jan ''Honza''
    [J]. INTERSPEECH 2021, 2021, : 2366 - 2370
  • [8] Generalized End-to-End Loss for Forensic Speaker Verification
    Huapeng WANG
    Fangzhou HE
    Lianquan WU
    [J]. Journal of Systems Science and Information, 2023, 11 (02) : 264 - 276
  • [9] Contrastive Learning for improving End-to-end Speaker Verification
    Tang, Yanxi
    Wang, Jianzong
    Qu, Xiaoyang
    Xiao, Jing
    [J]. 2021 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2021,
  • [10] End-to-End Text-Dependent Speaker Verification
    Heigold, Georg
    Moreno, Ignacio
    Bengio, Samy
    Shazeer, Noam
    [J]. 2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 5115 - 5119