UnionFormer: Unified-Learning Transformer with Multi-View Representation for Image Manipulation Detection and Localization

被引：2

作者：

Li, Shuaibo ^{[1
,2
]}

Ma, Wei ^{[1
]}

Guo, Jianwei ^{[2
]}

Xu, Shibiao ^{[3
]}

Li, Benchong ^{[1
]}

Zhan, Xiaopeng ^{[2
]}

机构：

[1] Beijing Univ Technol, Beijing, Peoples R China

[2] Chinese Acad Sci, Inst Automat, MAIS, Beijing, Peoples R China

[3] Beijing Univ Posts & Telecommun, Beijing, Peoples R China

来源：

2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR) | 2024年

基金：

中国国家自然科学基金;

关键词：

NETWORKS;

D O I：

10.1109/CVPR52733.2024.01190

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We present UnionFormer, a novel framework that integrates tampering clues across three views by unified learning for image manipulation detection and localization. Specifically, we construct a BSFI-Net to extract tampering features from RGB and noise views, achieving enhanced responsiveness to boundary artifacts while modulating spatial consistency at different scales. Additionally, to explore the inconsistency between objects as a new view of clues, we combine object consistency modeling with tampering detection and localization into a three-task unified learning process, allowing them to promote and improve mutually. Therefore, we acquire a unified manipulation discriminative representation under multi-scale supervision that consolidates information from three views. This integration facilitates highly effective concurrent detection and localization of tampering. We perform extensive experiments on diverse datasets, and the results show that the proposed approach outperforms state-of-the-art methods in tampering detection and localization.

引用

页码：12523 / 12533

页数：11

共 50 条

[1] Multimodal Transformer With Multi-View Visual Representation for Image Captioning
Yu, Jun
Li, Jing
Yu, Zhou
Huang, Qingming
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2020, 30 (12) : 4467 - 4480
[2] Robust Multi-view Representation: A Unified Perspective from Multi-view Learning to Domain Adaption
Ding, Zhengming
Shao, Ming
Fu, Yun
PROCEEDINGS OF THE TWENTY-SEVENTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2018, : 5434 - 5440
[3] Joint Multi-View Representation Learning and Image Tagging
Xue, Zhe
Li, Guorong
Huang, Qingming
THIRTIETH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2016, : 1366 - 1372
[4] Learning topographic representation for multi-view image patterns
Li, SZ
Lv, XG
Zhang, HJ
Fu, QD
Cheng, YM
2001 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I-VI, PROCEEDINGS: VOL I: SPEECH PROCESSING 1; VOL II: SPEECH PROCESSING 2 IND TECHNOL TRACK DESIGN & IMPLEMENTATION OF SIGNAL PROCESSING SYSTEMS NEURALNETWORKS FOR SIGNAL PROCESSING; VOL III: IMAGE & MULTIDIMENSIONAL SIGNAL PROCESSING MULTIMEDIA SIGNAL PROCESSING, 2001, : 1329 - 1332
[5] Image Manipulation Detection by Multi-View Multi-Scale Supervision
Chen, Xinru
Dong, Chengbo
Ji, Jiaqi
Cao, Juan
Li, Xirong
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 14165 - 14173
[6] MetaViewer: Towards A Unified Multi-View Representation
Wang, Ren
Sun, Haoliang
Ma, Yuling
Xi, Xiaoming
Yin, Yilong
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 11590 - 11599
[7] Multi-view representation learning for multi-view action recognition
Hao, Tong
Wu, Dan
Wang, Qian
Sun, Jin-Sheng
JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2017, 48 : 453 - 460
[8] Unified Representation Learning for Multi-View Clustering by Between/Within View Deep Majorization
Zhang, Yue
Yang, Sirui
Huang, Weitian
Wang, Chang-Dong
Cai, Hongmin
IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE, 2024, 8 (01): : 615 - 626
[9] Multi-view representation learning and understanding
Multimedia Tools and Applications, 2021, 80 : 22865 - 22865
[10] Decoupled representation for multi-view learning
Sun, Shiding
Wang, Bo
Tian, Yingjie
PATTERN RECOGNITION, 2024, 151

← 1 2 3 4 5 →