A Video Visual Security Metric Based on Spatiotemporal Self-Attention

被引:0
|
作者
Tang, Bo [1 ]
Li, Fengdong [1 ]
Liu, Jianbo [1 ]
Yang, Cheng [1 ]
机构
[1] Commun Univ China, Sch Informat & Commun Engn, Beijing 100024, Peoples R China
关键词
Visualization; Encryption; Measurement; Feature extraction; Correlation; Quality assessment; Image edge detection; Visual security index; regional correlation; visual persistence; unidirectional window self-attention; QUALITY ASSESSMENT; SELECTIVE ENCRYPTION; IMAGE; PRIVACY; CONTEXT;
D O I
10.1109/TIFS.2024.3459731
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
The Visual Security Index (VSI) of encrypted videos measure the security of encryption algorithms by evaluating the visual information content, which provides a critical evaluation criterion for selective encryption. The VSI for encrypted videos needs to assess security in both spatial and temporal domains. Existing visual security metrics, which rely on averaging, optical flow, and convolutions, fail to capture information leakage in the temporal domain effectively. This paper proposes a spatiotemporal self-attention-based video security assessment model called Spatiotemporal Self Attention (StSA). In the spatial domain, windowed self-attention is used to calculate regional correlations within video frames. By introducing multi-layer outputs, a multi-depth self-attention network named Multi-Depth Swin-Transformer (MDST) is constructed to compute the regional correlation within video frames. A weak label calculation method based on edge similarity is proposed to calculate the scores for frames and blocks based on the video Mean Opinion Score (MOS), thereby supporting the pre-training of spatial models. In the temporal domain, considering human visual persistence characteristics and the one-way relationship between video frames, temporal unidirectional window self-attention is proposed to calculate frame correlations in the temporal sequence. Finally, the visual security index score for encrypted videos is obtained by combining the spatiotemporal correlation changes of encrypted and plaintext videos. Experimental results show that StSA achieves a Pearson Linear Correlation Coefficient (PLCC) of 0.955 and a Root Mean Squared Error (RMSE) of 0.458 on the encryption datasets. Compared to other visual security metrics, StSA demonstrates higher accuracy and correlation, effectively capturing spatiotemporal information leakage in encrypted videos and reflecting the human perception of the security.
引用
收藏
页码:9230 / 9244
页数:15
相关论文
共 50 条
  • [1] Spatiotemporal module for video saliency prediction based on self-attention
    Wang, Yuhao
    Liu, Zhuoran
    Xia, Yibo
    Zhu, Chunbo
    Zhao, Danpei
    [J]. IMAGE AND VISION COMPUTING, 2021, 112
  • [2] Self-Attention Based Video Summarization
    Li, Yiyi
    Wang, Jilong
    [J]. Jisuanji Fuzhu Sheji Yu Tuxingxue Xuebao/Journal of Computer-Aided Design and Computer Graphics, 2020, 32 (04): : 652 - 659
  • [3] Self-Attention ConvLSTM for Spatiotemporal Prediction
    Lin, Zhihui
    Li, Maomao
    Zheng, Zhuobin
    Cheng, Yangyang
    Yuan, Chun
    [J]. THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 11531 - 11538
  • [4] An Effective Video Transformer With Synchronized Spatiotemporal and Spatial Self-Attention for Action Recognition
    Alfasly, Saghir
    Chui, Charles K.
    Jiang, Qingtang
    Lu, Jian
    Xu, Chen
    [J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024, 35 (02) : 2496 - 2509
  • [5] Compressed Self-Attention for Deep Metric Learning
    Chen, Ziye
    Gong, Mingming
    Xu, Yanwu
    Wang, Chaohui
    Zhang, Kun
    Du, Bo
    [J]. THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 3561 - 3568
  • [6] Facial Action Unit Recognition Based on Self-Attention Spatiotemporal Fusion
    Liang, Chaolei
    Zou, Wei
    Hu, Danfeng
    Wang, JiaJun
    [J]. 2024 5TH INTERNATIONAL CONFERENCE ON COMPUTING, NETWORKS AND INTERNET OF THINGS, CNIOT 2024, 2024, : 600 - 605
  • [7] LEARNING HIERARCHICAL SELF-ATTENTION FOR VIDEO SUMMARIZATION
    Liu, Yen-Ting
    Li, Yu-Jhe
    Yang, Fu-En
    Chen, Shang-Fu
    Wang, Yu-Chiang Frank
    [J]. 2019 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2019, : 3377 - 3381
  • [8] Unsupervised Video Anomaly Detection with Self-Attention based Feature Aggregating
    Ye, Zhenhao
    Li, Yanlong
    Cui, Zhichao
    Liu, Yuehu
    Li, Li
    Wang, Le
    Zhang, Chi
    [J]. 2023 IEEE 26TH INTERNATIONAL CONFERENCE ON INTELLIGENT TRANSPORTATION SYSTEMS, ITSC, 2023, : 3551 - 3556
  • [9] Spatiotemporal visual attention architecture for video analysis
    Rapantzikos, K
    Tsapatsoulis, N
    Avrithis, Y
    [J]. 2004 IEEE 6TH WORKSHOP ON MULTIMEDIA SIGNAL PROCESSING, 2004, : 83 - 86
  • [10] Exploring Self-Attention for Visual Intersection Classification
    Nakata, Haruki
    Tanaka, Kanji
    Takeda, Koji
    [J]. JOURNAL OF ADVANCED COMPUTATIONAL INTELLIGENCE AND INTELLIGENT INFORMATICS, 2023, 27 (03) : 386 - 393