Learning Visual Prior via Generative Pre-Training

被引：0

作者：

Xie, Jinheng ^{[1
]}

Ye, Kai ^{[2
]}

Li, Yudong ^{[2
]}

Li, Yuexiang ^{[3
]}

Lin, Kevin Qinghong ^{[1
]}

Zheng, Yefeng ^{[3
]}

Shen, Linlin ^{[2
]}

Shou, Mike Zheng ^{[1
]}

机构：

[1] Natl Univ Singapore, Show Lab, Singapore, Singapore

[2] Shenzhen Univ, Shenzhen, Peoples R China

[3] Tencent YouTu Lab, Jarvis Res Ctr, Shenzhen, Peoples R China

来源：

ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023) | 2023年

基金：

新加坡国家研究基金会;

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Various stuff and things in visual data possess specific traits, which can be learned by deep neural networks and are implicitly represented as the visual prior, e.g., object location and shape, in the model. Such prior potentially impacts many vision tasks. For example, in conditional image synthesis, spatial conditions failing to adhere to the prior can result in visually inaccurate synthetic results. This work aims to explicitly learn the visual prior and enable the customization of sampling. Inspired by advances in language modeling, we propose to learn Visual prior via Generative Pre-Training, dubbed VISORGPT. By discretizing visual locations, e.g., bounding boxes, human pose, and instance masks, into sequences, VISORGPT can model visual prior through likelihood maximization. Besides, prompt engineering is investigated to unify various visual locations and enable customized sampling of sequential outputs from the learned prior. Experimental results demonstrate the effectiveness of VISORGPT in modeling visual prior and extrapolating to novel scenes, potentially motivating that discrete visual locations can be integrated into the learning paradigm of current language models to further perceive visual world.

引用

页数：19

共 50 条

[1] ASRLM: ASR-Robust Language Model Pre-training via Generative and Discriminative Learning
Hu, Qian
Han, Xue
Wang, Yiting
Wang, Yitong
Deng, Chao
Feng, Junlan
NATURAL LANGUAGE PROCESSING AND CHINESE COMPUTING, PT III, NLPCC 2024, 2025, 15361 : 407 - 419
[2] A Multi-view Molecular Pre-training with Generative Contrastive Learning
Liu, Yunwu
Zhang, Ruisheng
Yuan, Yongna
Ma, Jun
Li, Tongfeng
Yu, Zhixuan
INTERDISCIPLINARY SCIENCES-COMPUTATIONAL LIFE SCIENCES, 2024, 16 (03) : 741 - 754
[3] Learning to See before Learning to Act: Visual Pre-training for Manipulation
Lin Yen-Chen
Zeng, Andy
Song, Shuran
Isola, Phillip
Lin, Tsung-Yi
2020 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA), 2020, : 7286 - 7293
[4] Explanation Graph Generation via Generative Pre-training over Synthetic Graphs
Cui, Han
Li, Shangzhan
Zhang, Yu
Shi, Qi
FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2023), 2023, : 9916 - 9934
[5] A Novel Distilled Generative Essay Polish System via Hierarchical Pre-Training
Yang, Qichuan
Zhang, Liuxin
Zhang, Yang
Gao, Jinghua
Wang, Siyun
2022 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2022,
[6] Pre-training via Paraphrasing
Lewis, Mike
Ghazvininejad, Marjan
Ghosh, Gargi
Aghajanyan, Armen
Wang, Sida
Zettlemoyer, Luke
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33
[7] Multilingual Molecular Representation Learning via Contrastive Pre-training
Guo, Zhihui
Sharma, Pramod
Martinez, Andy
Du, Liang
Abraham, Robin
PROCEEDINGS OF THE 60TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), VOL 1: (LONG PAPERS), 2022, : 3441 - 3453
[8] LogiGAN: Learning Logical Reasoning via Adversarial Pre-training
Pi, Xinyu
Zhong, Wanjun
Gao, Yan
Duan, Nan
Lou, Jian-Guang
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
[9] Quality Diversity for Visual Pre-Training
Chavhan, Ruchika
Gouk, Henry
Li, Da
Hospedales, Timothy
2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 5361 - 5371
[10] Real-World Robot Learning with Masked Visual Pre-training
Radosavovic, Ilija
Xiao, Tete
James, Stephen
Abbeel, Pieter
Malik, Jitendra
Darrell, Trevor
CONFERENCE ON ROBOT LEARNING, VOL 205, 2022, 205 : 416 - 426

← 1 2 3 4 5 →