Fast speech adversarial example generation for keyword spotting system with conditional GAN

被引:4
|
作者
Wang, Donghua [1 ]
Dong, Li [1 ]
Wang, Rangding [1 ]
Yan, Diqun [1 ]
机构
[1] Ningbo Univ, Fac Elect Engn & Comp Sci, Ningbo 315211, Zhejiang, Peoples R China
基金
中国国家自然科学基金; 浙江省自然科学基金;
关键词
Adversarial attack; Speech adversarial examples; Conditional generative adversarial network; Keyword spotting (KWS);
D O I
10.1016/j.comcom.2021.08.010
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Deep network-based keyword spotting (KWS) has embraced great success in many speech assistant applications. However, such network-based KWS systems were demonstrated vulnerable to adversarial attacks. In this work, we propose to utilize a conditional generative adversarial network (CGAN) to efficiently craft targeted speech adversarial examples. Specifically, we first transform the attacking target label into a vector, which is treated as the condition input of CGAN. The generator in CGAN is tasked to generate perturbation that could make the adversarial example misclassified as the pre-specified target keyword, while simultaneously deceiving the discriminator to misclassify the adversarial example as genuine. The discriminator aims to differentiate the crafted adversarial examples from the legitimate samples. Secondly, the target network-based KWS classifier(s) are ensembled and integrated into the proposed CGAN framework to enforce the generator to construct model independent perturbation. The classification error loss of the target KWS is back-propagated through gradients for guiding the weight update of the generator. Finally, with properly devised network architecture and training procedure, we obtain a well-trained generator that generates the adversarial perturbation for a given speech clip and target label. Experimental results show that the crafted adversarial examples could effectively attack the state-of-the-art KWS system with quite a high attack success rate, while attaining acceptable perception quality.
引用
收藏
页码:145 / 156
页数:12
相关论文
共 29 条
  • [1] Fast Keyword Spotting in Telephone Speech
    Nouza, Jan
    Silovsky, Jan
    RADIOENGINEERING, 2009, 18 (04) : 665 - 670
  • [2] DOMAIN ADVERSARIAL TRAINING FOR IMPROVING KEYWORD SPOTTING PERFORMANCE OF ESL SPEECH
    Hou, Jingyong
    Guo, Pengcheng
    Sun, Sining
    Soong, Frank K.
    Hu, Wenping
    Xie, Lei
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 8122 - 8126
  • [3] KEYWORD AND PHRASE SPOTTING BY USE OF HARPY SPEECH SYSTEM
    LOWERRE, BT
    REDDY, R
    JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1978, 64 : S182 - S182
  • [4] THE 2016 BBN GEORGIAN TELEPHONE SPEECH KEYWORD SPOTTING SYSTEM
    Alumae, Tanel
    Karakos, Damianos
    Hartmann, William
    Hsiao, Roger
    Zhang, Le
    Long Nguyen
    Tsakalidis, Stavros
    Schwartz, Richard
    2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 5755 - 5759
  • [5] THE 2013 BBN VIETNAMESE TELEPHONE SPEECH KEYWORD SPOTTING SYSTEM
    Tsakalidis, Stavros
    Hsiao, Roger
    Karakos, Damianos
    Ng, Tim
    Ranjan, Shivesh
    Saikumar, Guruprasad
    Zhang, Le
    Nguyen, Long
    Schwartz, Richard
    Makhoul, John
    2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,
  • [6] Targeted Speech Adversarial Example Generation With Generative Adversarial Network
    Wang, Donghua
    Dong, Li
    Wang, Rangding
    Yan, Diqun
    Wang, Jie
    IEEE ACCESS, 2020, 8 (08): : 124503 - 124513
  • [7] A multimodel keyword spotting system based on lip movement and speech features
    Handa, Anand
    Agarwal, Rashi
    Kohli, Narendra
    MULTIMEDIA TOOLS AND APPLICATIONS, 2020, 79 (27-28) : 20461 - 20481
  • [8] A multimodel keyword spotting system based on lip movement and speech features
    Anand Handa
    Rashi Agarwal
    Narendra Kohli
    Multimedia Tools and Applications, 2020, 79 : 20461 - 20481
  • [9] Very Fast Keyword Spotting System with Real Time Factor Below 0.01
    Nouza, Jan
    Cerva, Petr
    Zdansky, Jindrich
    TEXT, SPEECH, AND DIALOGUE (TSD 2020), 2020, 12284 : 426 - 436
  • [10] A Model of Emotional Speech Generation Based on Conditional Generative Adversarial Networks
    Jia, Ning
    Zheng, Chunjun
    Sun, Wei
    2019 11TH INTERNATIONAL CONFERENCE ON INTELLIGENT HUMAN-MACHINE SYSTEMS AND CYBERNETICS (IHMSC 2019), VOL 1, 2019, : 106 - 109