A Video Visual Security Metric Based on Spatiotemporal Self-Attention

被引：0

作者：

Tang, Bo ^{[1
]}

Li, Fengdong ^{[1
]}

Liu, Jianbo ^{[1
]}

Yang, Cheng ^{[1
]}

机构：

[1] Commun Univ China, Sch Informat & Commun Engn, Beijing 100024, Peoples R China

来源：

IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY | 2024年 / 19卷

关键词：

Visualization; Encryption; Measurement; Feature extraction; Correlation; Quality assessment; Image edge detection; Visual security index; regional correlation; visual persistence; unidirectional window self-attention; QUALITY ASSESSMENT; SELECTIVE ENCRYPTION; IMAGE; PRIVACY; CONTEXT;

D O I：

10.1109/TIFS.2024.3459731

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

The Visual Security Index (VSI) of encrypted videos measure the security of encryption algorithms by evaluating the visual information content, which provides a critical evaluation criterion for selective encryption. The VSI for encrypted videos needs to assess security in both spatial and temporal domains. Existing visual security metrics, which rely on averaging, optical flow, and convolutions, fail to capture information leakage in the temporal domain effectively. This paper proposes a spatiotemporal self-attention-based video security assessment model called Spatiotemporal Self Attention (StSA). In the spatial domain, windowed self-attention is used to calculate regional correlations within video frames. By introducing multi-layer outputs, a multi-depth self-attention network named Multi-Depth Swin-Transformer (MDST) is constructed to compute the regional correlation within video frames. A weak label calculation method based on edge similarity is proposed to calculate the scores for frames and blocks based on the video Mean Opinion Score (MOS), thereby supporting the pre-training of spatial models. In the temporal domain, considering human visual persistence characteristics and the one-way relationship between video frames, temporal unidirectional window self-attention is proposed to calculate frame correlations in the temporal sequence. Finally, the visual security index score for encrypted videos is obtained by combining the spatiotemporal correlation changes of encrypted and plaintext videos. Experimental results show that StSA achieves a Pearson Linear Correlation Coefficient (PLCC) of 0.955 and a Root Mean Squared Error (RMSE) of 0.458 on the encryption datasets. Compared to other visual security metrics, StSA demonstrates higher accuracy and correlation, effectively capturing spatiotemporal information leakage in encrypted videos and reflecting the human perception of the security.

引用

页码：9230 / 9244

页数：15

共 50 条

[1] Spatiotemporal module for video saliency prediction based on self-attention
Wang, Yuhao
Liu, Zhuoran
Xia, Yibo
Zhu, Chunbo
Zhao, Danpei
[J]. IMAGE AND VISION COMPUTING, 2021, 112
[2] Self-Attention Based Video Summarization
Li, Yiyi
Wang, Jilong
[J]. Jisuanji Fuzhu Sheji Yu Tuxingxue Xuebao/Journal of Computer-Aided Design and Computer Graphics, 2020, 32 (04): : 652 - 659
[3] Self-Attention ConvLSTM for Spatiotemporal Prediction
Lin, Zhihui
Li, Maomao
Zheng, Zhuobin
Cheng, Yangyang
Yuan, Chun
[J]. THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 11531 - 11538
[4] An Effective Video Transformer With Synchronized Spatiotemporal and Spatial Self-Attention for Action Recognition
Alfasly, Saghir
Chui, Charles K.
Jiang, Qingtang
Lu, Jian
Xu, Chen
[J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024, 35 (02) : 2496 - 2509
[5] Compressed Self-Attention for Deep Metric Learning
Chen, Ziye
Gong, Mingming
Xu, Yanwu
Wang, Chaohui
Zhang, Kun
Du, Bo
[J]. THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 3561 - 3568
[6] Facial Action Unit Recognition Based on Self-Attention Spatiotemporal Fusion
Liang, Chaolei
Zou, Wei
Hu, Danfeng
Wang, JiaJun
[J]. 2024 5TH INTERNATIONAL CONFERENCE ON COMPUTING, NETWORKS AND INTERNET OF THINGS, CNIOT 2024, 2024, : 600 - 605
[7] LEARNING HIERARCHICAL SELF-ATTENTION FOR VIDEO SUMMARIZATION
Liu, Yen-Ting
Li, Yu-Jhe
Yang, Fu-En
Chen, Shang-Fu
Wang, Yu-Chiang Frank
[J]. 2019 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2019, : 3377 - 3381
[8] Unsupervised Video Anomaly Detection with Self-Attention based Feature Aggregating
Ye, Zhenhao
Li, Yanlong
Cui, Zhichao
Liu, Yuehu
Li, Li
Wang, Le
Zhang, Chi
[J]. 2023 IEEE 26TH INTERNATIONAL CONFERENCE ON INTELLIGENT TRANSPORTATION SYSTEMS, ITSC, 2023, : 3551 - 3556
[9] Spatiotemporal visual attention architecture for video analysis
Rapantzikos, K
Tsapatsoulis, N
Avrithis, Y
[J]. 2004 IEEE 6TH WORKSHOP ON MULTIMEDIA SIGNAL PROCESSING, 2004, : 83 - 86
[10] Exploring Self-Attention for Visual Intersection Classification
Nakata, Haruki
Tanaka, Kanji
Takeda, Koji
[J]. JOURNAL OF ADVANCED COMPUTATIONAL INTELLIGENCE AND INTELLIGENT INFORMATICS, 2023, 27 (03) : 386 - 393

← 1 2 3 4 5 →