Towards Decision-based Sparse Attacks on Video Recognition

被引：1

作者：

Jiang, Kaixun ^{[1
]}

Chen, Zhaoyu ^{[1
]}

Zhou, Xinyu ^{[2
]}

Zhang, Jingyu ^{[1
]}

Hong, Lingyi ^{[2
]}

Li, Bo ^{[2
,3
]}

Wang, Yan ^{[1
]}

Zhang, Wenqiang ^{[1
]}

机构：

[1] Fudan Univ, Acad Engn & Technol, Shanghai, Peoples R China

[2] Fudan Univ, Sch Comp Sci, Shanghai, Peoples R China

[3] Vivo Mobile Commun Co Ltd, Dongguan, Peoples R China

来源：

PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023 | 2023年

基金：

国家重点研发计划; 中国国家自然科学基金;

关键词：

adversarial examples; video action recognition; sparse attacks;

D O I：

10.1145/3581783.3611828

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Recent studies indicate that sparse attacks threaten the security of deep learning models, which modify only a small set of pixels in the input based on the l(0) norm constraint. While existing research has primarily focused on sparse attacks against image models, there is a notable gap in evaluating the robustness of video recognition models. To bridge this gap, we are the first to study sparse video attacks and propose an attack framework named V-DSA in the most challenging decision-based setting, in which threat models only return the predicted hard label. Specifically, V-DSA comprises two modules: a Cross-Modal Generator (CMG) for query-free transfer attacks on each frame and an Optical flow Grouping Evolution algorithm (OGE) for query-efficient spatial-temporal attacks. CMG passes each frame to generate the transfer video as the starting point of the attack based on the feature similarity between image classification and video recognition models. OGE first initializes populations based on transfer video and then leverages optical flow to establish the temporal connection of the perturbed pixels in each frame, which can reduce the parameter space and break the temporal relationship between frames specifically. Finally, OGE complements the above optical flow modeling by grouping evolution which can realize the coarse-to-fine attack to avoid falling into the local optimum. In addition, OGE makes the perturbation with temporal coherence while balancing the number of perturbed pixels per frame, further increasing the imperceptibility of the attack. Extensive experiments demonstrate that V-DSA achieves state-of-the-art performance in terms of both threat effectiveness and imperceptibility. We hope V-DSA can provide valuable insights into the security of video recognition systems.

引用

页码：1443 / 1454

页数：12

共 50 条

[31] DECISION-BASED ORDER STATISTIC FILTERS
LEE, YH
TANTARATANA, S
[J]. IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1990, 38 (03): : 406 - 420
[32] A DECISION-BASED CONFIGURATION PROCESS ENVIRONMENT
ROSE, T
JARKE, M
GOCEK, M
MALTZAHN, C
NISSEN, HW
[J]. SOFTWARE ENGINEERING JOURNAL, 1991, 6 (05): : 332 - 346
[33] Group Decision-based Collaborative Design
Rong, Zhijun
Li, Peigen
Shao, Xinyu
Rong, Zhijun
Chen, Kuisheng
[J]. WCICA 2006: SIXTH WORLD CONGRESS ON INTELLIGENT CONTROL AND AUTOMATION, VOLS 1-12, CONFERENCE PROCEEDINGS, 2006, : 6840 - +
[34] Decision-Based Fusion for Vehicle Matching
Ghanem, Sally
Kerekes, Ryan A.
Tokola, Ryan
[J]. SENSORS, 2022, 22 (07)
[35] Decision-based approach for reliability design
Nikolaidis, Efstratios
[J]. JOURNAL OF MECHANICAL DESIGN, 2007, 129 (05) : 466 - 475
[36] Towards decision-based global land use models for improved understanding of the Earth system
Rounsevell, M. D. A.
Arneth, A.
Alexander, P.
Brown, D. G.
de Noblet-Ducoudre, N.
Ellis, E.
Finnigan, J.
Galvin, K.
Grigg, N.
Harman, I.
Lennox, J.
Magliocca, N.
Parker, D.
O'Neill, B. C.
Verburg, P. H.
Young, O.
[J]. EARTH SYSTEM DYNAMICS, 2014, 5 (01) : 117 - 137
[37] Joint sparse representation for video-based face recognition
Cui, Zhen
Chang, Hong
Shan, Shiguang
Ma, Bingpeng
Chen, Xilin
[J]. NEUROCOMPUTING, 2014, 135 : 306 - 312
[38] Consistent Sparse Representation for Video-Based Face Recognition
Liu, Xiuping
Shen, Aihong
Zhang, Jie
Cao, Junjie
Zhou, Yanfang
[J]. COMPUTER VISION - ACCV 2016, PT III, 2017, 10113 : 404 - 418
[39] On Video Based Face Recognition Through Adaptive Sparse Dictionary
Khan, Naimul Mefraz
Nan, Xiaoming
Quddus, Azhar
Rosales, Edward
Guan, Ling
[J]. 2015 11TH IEEE INTERNATIONAL CONFERENCE AND WORKSHOPS ON AUTOMATIC FACE AND GESTURE RECOGNITION (FG), VOL. 1, 2015,
[40] Why recognition in a statistics-based face recognition system should be based on the pure face portion: a probabilistic decision-based proof
Chen, LF
Liao, HYM
Lin, JC
Han, CC
[J]. PATTERN RECOGNITION, 2001, 34 (07) : 1393 - 1403

← 1 2 3 4 5 →