MGQFormer: Mask-Guided Query-Based Transformer for Image Manipulation Localization

被引：0

作者：

Zeng, Kunlun ^{[1
]}

Cheng, Ri ^{[1
]}

Tan, Weimin ^{[1
]}

Yan, Bo ^{[1
]}

机构：

[1] Fudan Univ, Shanghai Key Lab Intelligent Informat Proc, Sch Comp Sci, Shanghai, Peoples R China

来源：

THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 7 | 2024年

基金：

上海市自然科学基金;

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Deep learning-based models have made great progress in image tampering localization, which aims to distinguish between manipulated and authentic regions. However, these models suffer from inefficient training. This is because they use ground-truth mask labels mainly through the cross-entropy loss, which prioritizes per-pixel precision but disregards the spatial location and shape details of manipulated regions. To address this problem, we propose a Mask-Guided Query-based Transformer Framework (MGQFormer), which uses ground-truth masks to guide the learnable query token (LQT) in identifying the forged regions. Specifically, we extract feature embeddings of ground-truth masks as the guiding query token (GQT) and feed GQT and LQT into MGQ-Former to estimate fake regions, respectively. Then we make MGQFormer learn the position and shape information in ground-truth mask labels by proposing a mask-guided loss to reduce the feature distance between GQT and LQT. We also observe that such mask-guided training strategy has a significant impact on the convergence speed of MGQFormer training. Extensive experiments on multiple benchmarks show that our method significantly improves over state-of-the-art methods.

引用

页码：6944 / 6952

页数：9

共 50 条

[1] A Mask-Guided Transformer Network with Topic Token for Remote Sensing Image Captioning
Ren, Zihao
Gou, Shuiping
Guo, Zhang
Mao, Shasha
Li, Ruimin
[J]. REMOTE SENSING, 2022, 14 (12)
[2] Mask-guided Spectral-wise Transformer for Efficient Hyperspectral Image Reconstruction
Cai, Yuanhao
Lin, Jing
Hu, Xiaowan
Wang, Haoqian
Yuan, Xin
Zhang, Yulun
Timofte, Radu
Van Gool, Luc
[J]. 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 17481 - 17490
[3] Mask-guided network for image captioning
Lim, Jian Han
Chan, Chee Seng
[J]. PATTERN RECOGNITION LETTERS, 2023, 173 : 79 - 86
[4] GMFIM: A generative mask-guided facial image manipulation model for privacy preservation
Khojasteh, Mohammad Hossein
Farid, Nastaran Moradzadeh
Nickabadi, Ahmad
[J]. COMPUTERS & GRAPHICS-UK, 2023, 112 : 81 - 91
[5] Progressive Mask Transformer With Edge Enhancement for Image Manipulation Localization
Zhu, Ye
Liu, Jian
Yu, Yang
Guo, Yingchun
Hao, Xiaoke
[J]. IEEE SIGNAL PROCESSING LETTERS, 2024, 31 : 2380 - 2384
[6] Mask-guided Image Classification with Siamese Networks
Alqasir, Hiba
Muselet, Damien
Ducottet, Christophe
[J]. PROCEEDINGS OF THE 15TH INTERNATIONAL JOINT CONFERENCE ON COMPUTER VISION, IMAGING AND COMPUTER GRAPHICS THEORY AND APPLICATIONS, VOL 5: VISAPP, 2020, : 536 - 543
[7] Segmentation mask-guided person image generation
Meichen Liu
Xin Yan
Chenhui Wang
Kejun Wang
[J]. Applied Intelligence, 2021, 51 : 1161 - 1176
[8] MagConv: Mask-Guided Convolution for Image Inpainting
Yu, Xuexin
Xu, Long
Li, Jia
Ji, Xiangyang
[J]. IEEE TRANSACTIONS ON IMAGE PROCESSING, 2023, 32 : 4716 - 4727
[9] Segmentation mask-guided person image generation
Liu, Meichen
Yan, Xin
Wang, Chenhui
Wang, Kejun
[J]. APPLIED INTELLIGENCE, 2021, 51 (02) : 1161 - 1176
[10] Mask-Guided Transformer for Human-Object Interaction Detection
Ying, Daocheng
Yang, Hua
Sun, Jun
[J]. 2022 IEEE INTERNATIONAL CONFERENCE ON VISUAL COMMUNICATIONS AND IMAGE PROCESSING (VCIP), 2022,

← 1 2 3 4 5 →