Learning Visual Prior via Generative Pre-Training

被引:0
|
作者
Xie, Jinheng [1 ]
Ye, Kai [2 ]
Li, Yudong [2 ]
Li, Yuexiang [3 ]
Lin, Kevin Qinghong [1 ]
Zheng, Yefeng [3 ]
Shen, Linlin [2 ]
Shou, Mike Zheng [1 ]
机构
[1] Natl Univ Singapore, Show Lab, Singapore, Singapore
[2] Shenzhen Univ, Shenzhen, Peoples R China
[3] Tencent YouTu Lab, Jarvis Res Ctr, Shenzhen, Peoples R China
基金
新加坡国家研究基金会;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Various stuff and things in visual data possess specific traits, which can be learned by deep neural networks and are implicitly represented as the visual prior, e.g., object location and shape, in the model. Such prior potentially impacts many vision tasks. For example, in conditional image synthesis, spatial conditions failing to adhere to the prior can result in visually inaccurate synthetic results. This work aims to explicitly learn the visual prior and enable the customization of sampling. Inspired by advances in language modeling, we propose to learn Visual prior via Generative Pre-Training, dubbed VISORGPT. By discretizing visual locations, e.g., bounding boxes, human pose, and instance masks, into sequences, VISORGPT can model visual prior through likelihood maximization. Besides, prompt engineering is investigated to unify various visual locations and enable customized sampling of sequential outputs from the learned prior. Experimental results demonstrate the effectiveness of VISORGPT in modeling visual prior and extrapolating to novel scenes, potentially motivating that discrete visual locations can be integrated into the learning paradigm of current language models to further perceive visual world.
引用
收藏
页数:19
相关论文
共 50 条
  • [31] Visual Alignment Pre-training for Sign Language Translation
    Jiao, Peiqi
    Min, Yuecong
    Chen, Xilin
    COMPUTER VISION - ECCV 2024, PT XLII, 2025, 15100 : 349 - 367
  • [32] Meta-Learning to Improve Pre-Training
    Raghu, Aniruddh
    Lorraine, Jonathan
    Kornblith, Simon
    McDermott, Matthew
    Duvenaud, David
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
  • [33] Robust Pre-Training by Adversarial Contrastive Learning
    Jiang, Ziyu
    Chen, Tianlong
    Chen, Ting
    Wang, Zhangyang
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33
  • [34] Multilingual Pre-training with Universal Dependency Learning
    Sun, Kailai
    Li, Zuchao
    Zhao, Hai
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
  • [35] Learning Chemical Rules of Retrosynthesis with Pre-training
    Jiang, Yinjie
    Wei, Ying
    Wu, Fei
    Huang, Zhengxing
    Kuang, Kun
    Wang, Zhihua
    THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 4, 2023, : 5113 - 5121
  • [36] Symbolizing Visual Features for Pre-training with Unlabeled Images
    Kamata, Yuichi
    Yamada, Moyuru
    Kato, Keizo
    Nakagawa, Akira
    Okatani, Takayuki
    PATTERN RECOGNITION, ACPR 2021, PT II, 2022, 13189 : 490 - 503
  • [37] GeoLayoutLM: Geometric Pre-training for Visual Information Extraction
    Luo, Chuwei
    Cheng, Changxu
    Zheng, Qi
    Yao, Cong
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR, 2023, : 7092 - 7101
  • [38] SEPT: Towards Scalable and Efficient Visual Pre-training
    Lin, Yiqi
    Zheng, Huabin
    Zhong, Huaping
    Zhu, Jinjing
    Li, Weijia
    He, Conghui
    Wang, Lin
    THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 2, 2023, : 1622 - 1630
  • [39] Efficient Conditional Pre-training for Transfer Learning
    Chakraborty, Shuvam
    Uzkent, Burak
    Ayush, Kumar
    Tanmay, Kumar
    Sheehan, Evan
    Ermon, Stefano
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS, CVPRW 2022, 2022, : 4240 - 4249
  • [40] MVP: Multimodality-Guided Visual Pre-training
    Wei, Longhui
    Xie, Lingxi
    Zhou, Wengang
    Li, Houqiang
    Tian, Qi
    COMPUTER VISION - ECCV 2022, PT XXX, 2022, 13690 : 337 - 353