Learning Visual Prior via Generative Pre-Training

被引：0

作者：

Xie, Jinheng ^{[1
]}

Ye, Kai ^{[2
]}

Li, Yudong ^{[2
]}

Li, Yuexiang ^{[3
]}

Lin, Kevin Qinghong ^{[1
]}

Zheng, Yefeng ^{[3
]}

Shen, Linlin ^{[2
]}

Shou, Mike Zheng ^{[1
]}

机构：

[1] Natl Univ Singapore, Show Lab, Singapore, Singapore

[2] Shenzhen Univ, Shenzhen, Peoples R China

[3] Tencent YouTu Lab, Jarvis Res Ctr, Shenzhen, Peoples R China

来源：

ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023) | 2023年

基金：

新加坡国家研究基金会;

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Various stuff and things in visual data possess specific traits, which can be learned by deep neural networks and are implicitly represented as the visual prior, e.g., object location and shape, in the model. Such prior potentially impacts many vision tasks. For example, in conditional image synthesis, spatial conditions failing to adhere to the prior can result in visually inaccurate synthetic results. This work aims to explicitly learn the visual prior and enable the customization of sampling. Inspired by advances in language modeling, we propose to learn Visual prior via Generative Pre-Training, dubbed VISORGPT. By discretizing visual locations, e.g., bounding boxes, human pose, and instance masks, into sequences, VISORGPT can model visual prior through likelihood maximization. Besides, prompt engineering is investigated to unify various visual locations and enable customized sampling of sequential outputs from the learned prior. Experimental results demonstrate the effectiveness of VISORGPT in modeling visual prior and extrapolating to novel scenes, potentially motivating that discrete visual locations can be integrated into the learning paradigm of current language models to further perceive visual world.

引用

页数：19

共 50 条

[31] Visual Alignment Pre-training for Sign Language Translation
Jiao, Peiqi
Min, Yuecong
Chen, Xilin
COMPUTER VISION - ECCV 2024, PT XLII, 2025, 15100 : 349 - 367
[32] Meta-Learning to Improve Pre-Training
Raghu, Aniruddh
Lorraine, Jonathan
Kornblith, Simon
McDermott, Matthew
Duvenaud, David
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
[33] Robust Pre-Training by Adversarial Contrastive Learning
Jiang, Ziyu
Chen, Tianlong
Chen, Ting
Wang, Zhangyang
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33
[34] Multilingual Pre-training with Universal Dependency Learning
Sun, Kailai
Li, Zuchao
Zhao, Hai
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
[35] Learning Chemical Rules of Retrosynthesis with Pre-training
Jiang, Yinjie
Wei, Ying
Wu, Fei
Huang, Zhengxing
Kuang, Kun
Wang, Zhihua
THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 4, 2023, : 5113 - 5121
[36] Symbolizing Visual Features for Pre-training with Unlabeled Images
Kamata, Yuichi
Yamada, Moyuru
Kato, Keizo
Nakagawa, Akira
Okatani, Takayuki
PATTERN RECOGNITION, ACPR 2021, PT II, 2022, 13189 : 490 - 503
[37] GeoLayoutLM: Geometric Pre-training for Visual Information Extraction
Luo, Chuwei
Cheng, Changxu
Zheng, Qi
Yao, Cong
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR, 2023, : 7092 - 7101
[38] SEPT: Towards Scalable and Efficient Visual Pre-training
Lin, Yiqi
Zheng, Huabin
Zhong, Huaping
Zhu, Jinjing
Li, Weijia
He, Conghui
Wang, Lin
THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 2, 2023, : 1622 - 1630
[39] Efficient Conditional Pre-training for Transfer Learning
Chakraborty, Shuvam
Uzkent, Burak
Ayush, Kumar
Tanmay, Kumar
Sheehan, Evan
Ermon, Stefano
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS, CVPRW 2022, 2022, : 4240 - 4249
[40] MVP: Multimodality-Guided Visual Pre-training
Wei, Longhui
Xie, Lingxi
Zhou, Wengang
Li, Houqiang
Tian, Qi
COMPUTER VISION - ECCV 2022, PT XXX, 2022, 13690 : 337 - 353

← 1 2 3 4 5 →