FOOLING END-TO-END SPEAKER VERIFICATION WITH ADVERSARIAL EXAMPLES

被引:0
|
作者
Kreuk, Felix [1 ]
Adi, Yossi [1 ]
Cisse, Moustapha [2 ]
Keshet, Joseph [1 ]
机构
[1] Bar Ilan Univ, Ramat Gan, Israel
[2] Facebook AI Res, Paris, France
关键词
Automatic speaker verification; adversarial examples;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Automatic speaker verification systems are increasingly used as the primary means to authenticate costumers. Recently, it has been proposed to train speaker verification systems using end-to-end deep neural models. In this paper, we show that such systems are vulnerable to adversarial example attacks. Adversarial examples are generated by adding a peculiar noise to original speaker examples, in such a way that they are almost indistinguishable, by a human listener. Yet, the generated waveforms, which sound as speaker A can be used to fool such a system by claiming as if the waveforms were uttered by speaker B. We present white-box attacks on a deep end-to-end network that was either trained on YOHO or NTIMIT. We also present two black-box attacks. In the first one, we generate adversarial examples with a system trained on NTIMIT and perform the attack on a system that trained on YOHO. In the second one, we generate the adversarial examples with a system trained using Mel-spectrum features and perform the attack on a system trained using MFCCs. Our results show that one can significantly decrease the accuracy of a target system even when the adversarial examples are generated with different system potentially using different features.
引用
收藏
页码:1962 / 1966
页数:5
相关论文
共 50 条
  • [31] End-to-End Active Speaker Detection
    Alcazar, Juan Leon
    Cordes, Moritz
    Zhao, Chen
    Ghanem, Bernard
    [J]. COMPUTER VISION, ECCV 2022, PT XXXVII, 2022, 13697 : 126 - 143
  • [32] End-to-End Text-Independent Speaker Verification with Triplet Loss on Short Utterances
    Zhang, Chunlei
    Koishida, Kazuhito
    [J]. 18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 1487 - 1491
  • [33] End-to-end text-dependent speaker verification using novel distance measures
    Dey, Subhadeep
    Madikeri, Srikanth
    Motlicek, Petr
    [J]. 19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 3598 - 3602
  • [34] Joint Training of Expanded End-to-end DNN for Text-dependent Speaker Verification
    Heo, Hee-soo
    Jung, Jee-weon
    Yang, Il-ho
    Yoon, Sung-hyun
    Yu, Ha-jin
    [J]. 18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 1532 - 1536
  • [35] End-to-End Residual CNN with L-GM Loss Speaker Verification System
    Shi, Xuan
    Du, Xingjian
    Zhu, Mengyao
    [J]. 2018 IEEE 23RD INTERNATIONAL CONFERENCE ON DIGITAL SIGNAL PROCESSING (DSP), 2018,
  • [36] Modeling Suprasegmental Information Using Finite Difference Network for End-to-End Speaker Verification
    Li, Jin
    Mak, Man-Wai
    Yan, Nan
    Wang, Lan
    [J]. 2023 ASIA PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE, APSIPA ASC, 2023, : 119 - 124
  • [37] Avoiding Speaker Overfitting in End-to-End DNNs using Raw Waveform for Text-Independent Speaker Verification
    Jung, Jee-Weon
    Heo, Hee-Soo
    Yang, Il-Ho
    Shim, Hye-Jin
    Yu, Ha-Jin
    [J]. 19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 3583 - 3587
  • [38] Joint speaker encoder and neural back-end model for fully end-to-end automatic speaker verification with multiple enrollment utterances
    Zeng, Chang
    Miao, Xiaoxiao
    Wang, Xin
    Cooper, Erica
    Yamagishi, Junichi
    [J]. COMPUTER SPEECH AND LANGUAGE, 2024, 86
  • [39] SPEAKER ADAPTATION FOR END-TO-END CTC MODELS
    Li, Ke
    Li, Jinyu
    Zhao, Yong
    Kumar, Kshitiz
    Gong, Yifan
    [J]. 2018 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2018), 2018, : 542 - 549
  • [40] Improving Aggregation and Loss Function for Better Embedding Learning in End-to-End Speaker Verification System
    Gao, Zhifu
    Song, Yan
    McLoughlin, Ian
    Li, Pengcheng
    Jiang, Yiheng
    Dai, Lirong
    [J]. INTERSPEECH 2019, 2019, : 361 - 365