Learning Visual Prior via Generative Pre-Training

被引:0
|
作者
Xie, Jinheng [1 ]
Ye, Kai [2 ]
Li, Yudong [2 ]
Li, Yuexiang [3 ]
Lin, Kevin Qinghong [1 ]
Zheng, Yefeng [3 ]
Shen, Linlin [2 ]
Shou, Mike Zheng [1 ]
机构
[1] Natl Univ Singapore, Show Lab, Singapore, Singapore
[2] Shenzhen Univ, Shenzhen, Peoples R China
[3] Tencent YouTu Lab, Jarvis Res Ctr, Shenzhen, Peoples R China
基金
新加坡国家研究基金会;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Various stuff and things in visual data possess specific traits, which can be learned by deep neural networks and are implicitly represented as the visual prior, e.g., object location and shape, in the model. Such prior potentially impacts many vision tasks. For example, in conditional image synthesis, spatial conditions failing to adhere to the prior can result in visually inaccurate synthetic results. This work aims to explicitly learn the visual prior and enable the customization of sampling. Inspired by advances in language modeling, we propose to learn Visual prior via Generative Pre-Training, dubbed VISORGPT. By discretizing visual locations, e.g., bounding boxes, human pose, and instance masks, into sequences, VISORGPT can model visual prior through likelihood maximization. Besides, prompt engineering is investigated to unify various visual locations and enable customized sampling of sequential outputs from the learned prior. Experimental results demonstrate the effectiveness of VISORGPT in modeling visual prior and extrapolating to novel scenes, potentially motivating that discrete visual locations can be integrated into the learning paradigm of current language models to further perceive visual world.
引用
收藏
页数:19
相关论文
共 50 条
  • [1] ASRLM: ASR-Robust Language Model Pre-training via Generative and Discriminative Learning
    Hu, Qian
    Han, Xue
    Wang, Yiting
    Wang, Yitong
    Deng, Chao
    Feng, Junlan
    NATURAL LANGUAGE PROCESSING AND CHINESE COMPUTING, PT III, NLPCC 2024, 2025, 15361 : 407 - 419
  • [2] A Multi-view Molecular Pre-training with Generative Contrastive Learning
    Liu, Yunwu
    Zhang, Ruisheng
    Yuan, Yongna
    Ma, Jun
    Li, Tongfeng
    Yu, Zhixuan
    INTERDISCIPLINARY SCIENCES-COMPUTATIONAL LIFE SCIENCES, 2024, 16 (03) : 741 - 754
  • [3] Learning to See before Learning to Act: Visual Pre-training for Manipulation
    Lin Yen-Chen
    Zeng, Andy
    Song, Shuran
    Isola, Phillip
    Lin, Tsung-Yi
    2020 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA), 2020, : 7286 - 7293
  • [4] Explanation Graph Generation via Generative Pre-training over Synthetic Graphs
    Cui, Han
    Li, Shangzhan
    Zhang, Yu
    Shi, Qi
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2023), 2023, : 9916 - 9934
  • [5] A Novel Distilled Generative Essay Polish System via Hierarchical Pre-Training
    Yang, Qichuan
    Zhang, Liuxin
    Zhang, Yang
    Gao, Jinghua
    Wang, Siyun
    2022 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2022,
  • [6] Pre-training via Paraphrasing
    Lewis, Mike
    Ghazvininejad, Marjan
    Ghosh, Gargi
    Aghajanyan, Armen
    Wang, Sida
    Zettlemoyer, Luke
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33
  • [7] Multilingual Molecular Representation Learning via Contrastive Pre-training
    Guo, Zhihui
    Sharma, Pramod
    Martinez, Andy
    Du, Liang
    Abraham, Robin
    PROCEEDINGS OF THE 60TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), VOL 1: (LONG PAPERS), 2022, : 3441 - 3453
  • [8] LogiGAN: Learning Logical Reasoning via Adversarial Pre-training
    Pi, Xinyu
    Zhong, Wanjun
    Gao, Yan
    Duan, Nan
    Lou, Jian-Guang
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
  • [9] Quality Diversity for Visual Pre-Training
    Chavhan, Ruchika
    Gouk, Henry
    Li, Da
    Hospedales, Timothy
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 5361 - 5371
  • [10] Real-World Robot Learning with Masked Visual Pre-training
    Radosavovic, Ilija
    Xiao, Tete
    James, Stephen
    Abbeel, Pieter
    Malik, Jitendra
    Darrell, Trevor
    CONFERENCE ON ROBOT LEARNING, VOL 205, 2022, 205 : 416 - 426