Towards Decision-based Sparse Attacks on Video Recognition

被引:1
|
作者
Jiang, Kaixun [1 ]
Chen, Zhaoyu [1 ]
Zhou, Xinyu [2 ]
Zhang, Jingyu [1 ]
Hong, Lingyi [2 ]
Li, Bo [2 ,3 ]
Wang, Yan [1 ]
Zhang, Wenqiang [1 ]
机构
[1] Fudan Univ, Acad Engn & Technol, Shanghai, Peoples R China
[2] Fudan Univ, Sch Comp Sci, Shanghai, Peoples R China
[3] Vivo Mobile Commun Co Ltd, Dongguan, Peoples R China
基金
国家重点研发计划; 中国国家自然科学基金;
关键词
adversarial examples; video action recognition; sparse attacks;
D O I
10.1145/3581783.3611828
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recent studies indicate that sparse attacks threaten the security of deep learning models, which modify only a small set of pixels in the input based on the l(0) norm constraint. While existing research has primarily focused on sparse attacks against image models, there is a notable gap in evaluating the robustness of video recognition models. To bridge this gap, we are the first to study sparse video attacks and propose an attack framework named V-DSA in the most challenging decision-based setting, in which threat models only return the predicted hard label. Specifically, V-DSA comprises two modules: a Cross-Modal Generator (CMG) for query-free transfer attacks on each frame and an Optical flow Grouping Evolution algorithm (OGE) for query-efficient spatial-temporal attacks. CMG passes each frame to generate the transfer video as the starting point of the attack based on the feature similarity between image classification and video recognition models. OGE first initializes populations based on transfer video and then leverages optical flow to establish the temporal connection of the perturbed pixels in each frame, which can reduce the parameter space and break the temporal relationship between frames specifically. Finally, OGE complements the above optical flow modeling by grouping evolution which can realize the coarse-to-fine attack to avoid falling into the local optimum. In addition, OGE makes the perturbation with temporal coherence while balancing the number of perturbed pixels per frame, further increasing the imperceptibility of the attack. Extensive experiments demonstrate that V-DSA achieves state-of-the-art performance in terms of both threat effectiveness and imperceptibility. We hope V-DSA can provide valuable insights into the security of video recognition systems.
引用
收藏
页码:1443 / 1454
页数:12
相关论文
共 50 条
  • [31] DECISION-BASED ORDER STATISTIC FILTERS
    LEE, YH
    TANTARATANA, S
    [J]. IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1990, 38 (03): : 406 - 420
  • [32] A DECISION-BASED CONFIGURATION PROCESS ENVIRONMENT
    ROSE, T
    JARKE, M
    GOCEK, M
    MALTZAHN, C
    NISSEN, HW
    [J]. SOFTWARE ENGINEERING JOURNAL, 1991, 6 (05): : 332 - 346
  • [33] Group Decision-based Collaborative Design
    Rong, Zhijun
    Li, Peigen
    Shao, Xinyu
    Rong, Zhijun
    Chen, Kuisheng
    [J]. WCICA 2006: SIXTH WORLD CONGRESS ON INTELLIGENT CONTROL AND AUTOMATION, VOLS 1-12, CONFERENCE PROCEEDINGS, 2006, : 6840 - +
  • [34] Decision-Based Fusion for Vehicle Matching
    Ghanem, Sally
    Kerekes, Ryan A.
    Tokola, Ryan
    [J]. SENSORS, 2022, 22 (07)
  • [35] Decision-based approach for reliability design
    Nikolaidis, Efstratios
    [J]. JOURNAL OF MECHANICAL DESIGN, 2007, 129 (05) : 466 - 475
  • [36] Towards decision-based global land use models for improved understanding of the Earth system
    Rounsevell, M. D. A.
    Arneth, A.
    Alexander, P.
    Brown, D. G.
    de Noblet-Ducoudre, N.
    Ellis, E.
    Finnigan, J.
    Galvin, K.
    Grigg, N.
    Harman, I.
    Lennox, J.
    Magliocca, N.
    Parker, D.
    O'Neill, B. C.
    Verburg, P. H.
    Young, O.
    [J]. EARTH SYSTEM DYNAMICS, 2014, 5 (01) : 117 - 137
  • [37] Joint sparse representation for video-based face recognition
    Cui, Zhen
    Chang, Hong
    Shan, Shiguang
    Ma, Bingpeng
    Chen, Xilin
    [J]. NEUROCOMPUTING, 2014, 135 : 306 - 312
  • [38] Consistent Sparse Representation for Video-Based Face Recognition
    Liu, Xiuping
    Shen, Aihong
    Zhang, Jie
    Cao, Junjie
    Zhou, Yanfang
    [J]. COMPUTER VISION - ACCV 2016, PT III, 2017, 10113 : 404 - 418
  • [39] On Video Based Face Recognition Through Adaptive Sparse Dictionary
    Khan, Naimul Mefraz
    Nan, Xiaoming
    Quddus, Azhar
    Rosales, Edward
    Guan, Ling
    [J]. 2015 11TH IEEE INTERNATIONAL CONFERENCE AND WORKSHOPS ON AUTOMATIC FACE AND GESTURE RECOGNITION (FG), VOL. 1, 2015,
  • [40] Why recognition in a statistics-based face recognition system should be based on the pure face portion: a probabilistic decision-based proof
    Chen, LF
    Liao, HYM
    Lin, JC
    Han, CC
    [J]. PATTERN RECOGNITION, 2001, 34 (07) : 1393 - 1403