Fast speech adversarial example generation for keyword spotting system with conditional GAN

被引：4

作者：

Wang, Donghua ^{[1
]}

Dong, Li ^{[1
]}

Wang, Rangding ^{[1
]}

Yan, Diqun ^{[1
]}

机构：

[1] Ningbo Univ, Fac Elect Engn & Comp Sci, Ningbo 315211, Zhejiang, Peoples R China

来源：

COMPUTER COMMUNICATIONS | 2021年 / 179卷 / 179期

基金：

中国国家自然科学基金; 浙江省自然科学基金;

关键词：

Adversarial attack; Speech adversarial examples; Conditional generative adversarial network; Keyword spotting (KWS);

D O I：

10.1016/j.comcom.2021.08.010

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Deep network-based keyword spotting (KWS) has embraced great success in many speech assistant applications. However, such network-based KWS systems were demonstrated vulnerable to adversarial attacks. In this work, we propose to utilize a conditional generative adversarial network (CGAN) to efficiently craft targeted speech adversarial examples. Specifically, we first transform the attacking target label into a vector, which is treated as the condition input of CGAN. The generator in CGAN is tasked to generate perturbation that could make the adversarial example misclassified as the pre-specified target keyword, while simultaneously deceiving the discriminator to misclassify the adversarial example as genuine. The discriminator aims to differentiate the crafted adversarial examples from the legitimate samples. Secondly, the target network-based KWS classifier(s) are ensembled and integrated into the proposed CGAN framework to enforce the generator to construct model independent perturbation. The classification error loss of the target KWS is back-propagated through gradients for guiding the weight update of the generator. Finally, with properly devised network architecture and training procedure, we obtain a well-trained generator that generates the adversarial perturbation for a given speech clip and target label. Experimental results show that the crafted adversarial examples could effectively attack the state-of-the-art KWS system with quite a high attack success rate, while attaining acceptable perception quality.

引用

页码：145 / 156

页数：12

共 29 条

[1] Fast Keyword Spotting in Telephone Speech
Nouza, Jan
Silovsky, Jan
RADIOENGINEERING, 2009, 18 (04) : 665 - 670
[2] DOMAIN ADVERSARIAL TRAINING FOR IMPROVING KEYWORD SPOTTING PERFORMANCE OF ESL SPEECH
Hou, Jingyong
Guo, Pengcheng
Sun, Sining
Soong, Frank K.
Hu, Wenping
Xie, Lei
2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 8122 - 8126
[3] KEYWORD AND PHRASE SPOTTING BY USE OF HARPY SPEECH SYSTEM
LOWERRE, BT
REDDY, R
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1978, 64 : S182 - S182
[4] THE 2016 BBN GEORGIAN TELEPHONE SPEECH KEYWORD SPOTTING SYSTEM
Alumae, Tanel
Karakos, Damianos
Hartmann, William
Hsiao, Roger
Zhang, Le
Long Nguyen
Tsakalidis, Stavros
Schwartz, Richard
2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 5755 - 5759
[5] THE 2013 BBN VIETNAMESE TELEPHONE SPEECH KEYWORD SPOTTING SYSTEM
Tsakalidis, Stavros
Hsiao, Roger
Karakos, Damianos
Ng, Tim
Ranjan, Shivesh
Saikumar, Guruprasad
Zhang, Le
Nguyen, Long
Schwartz, Richard
Makhoul, John
2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,
[6] Targeted Speech Adversarial Example Generation With Generative Adversarial Network
Wang, Donghua
Dong, Li
Wang, Rangding
Yan, Diqun
Wang, Jie
IEEE ACCESS, 2020, 8 (08): : 124503 - 124513
[7] A multimodel keyword spotting system based on lip movement and speech features
Handa, Anand
Agarwal, Rashi
Kohli, Narendra
MULTIMEDIA TOOLS AND APPLICATIONS, 2020, 79 (27-28) : 20461 - 20481
[8] A multimodel keyword spotting system based on lip movement and speech features
Anand Handa
Rashi Agarwal
Narendra Kohli
Multimedia Tools and Applications, 2020, 79 : 20461 - 20481
[9] Very Fast Keyword Spotting System with Real Time Factor Below 0.01
Nouza, Jan
Cerva, Petr
Zdansky, Jindrich
TEXT, SPEECH, AND DIALOGUE (TSD 2020), 2020, 12284 : 426 - 436
[10] A Model of Emotional Speech Generation Based on Conditional Generative Adversarial Networks
Jia, Ning
Zheng, Chunjun
Sun, Wei
2019 11TH INTERNATIONAL CONFERENCE ON INTELLIGENT HUMAN-MACHINE SYSTEMS AND CYBERNETICS (IHMSC 2019), VOL 1, 2019, : 106 - 109

← 1 2 3 →