Dynamic Scene Graph Generation via Anticipatory Pre-training

被引:17
|
作者
Li, Yiming [1 ]
Yang, Xiaoshan [2 ,3 ,4 ]
Xu, Changsheng [2 ,3 ,4 ]
机构
[1] Zhengzhou Univ ZZU, Sch Informat Engn, Zhengzhou, Peoples R China
[2] Chinese Acad Sci CASIA, Inst Automat, Natl Lab Pattern Recognit, Beijing, Peoples R China
[3] Univ Chinese Acad Sci UCAS, Sch Artificial Intelligence, Beijing, Peoples R China
[4] PengCheng Lab PCL, Shenzhen, Peoples R China
基金
中国国家自然科学基金; 北京市自然科学基金;
关键词
D O I
10.1109/CVPR52688.2022.01350
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Humans can not only see the collection of objects in visual scenes, but also identify the relationship between objects. The visual relationship in the scene can be abstracted into the semantic representation of a triple hsubject; predicate; objecti and thus results in a scene graph, which can convey a lot of information for visual understanding. Due to the motion of objects, the visual relationship between two objects in videos may vary, which makes the task of dynamically generating scene graphs from videos more complicated and challenging than the conventional image-based static scene graph generation. Inspired by the ability of humans to infer the visual relationship, we propose a novel anticipatory pre-training paradigm based on Transformer to explicitly model the temporal correlation of visual relationships in different frames to improve dynamic scene graph generation. In pre-training stage, the model predicts the visual relationships of current frame based on the previous frames by extracting intra-frame spatial information with a spatial encoder and inter-frame temporal correlations with a progressive temporal encoder. In the fine-tuning stage, we reuse the spatial encoder and the progressive temporal encoder while the information of the current frame is combined for predicting the visual relationship. Extensive experiments demonstrate that our method achieves state-of-the-art performance on Action Genome dataset.
引用
收藏
页码:13864 / 13873
页数:10
相关论文
共 50 条
  • [1] Graph Pre-training for AMR Parsing and Generation
    Bai, Xuefeng
    Chen, Yulong
    Zhang, Yue
    [J]. PROCEEDINGS OF THE 60TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), VOL 1: (LONG PAPERS), 2022, : 6001 - 6015
  • [2] Pre-training on dynamic graph neural networks
    Chen, Ke-Jia
    Zhang, Jiajun
    Jiang, Linpu
    Wang, Yunyun
    Dai, Yuxuan
    [J]. NEUROCOMPUTING, 2022, 500 : 679 - 687
  • [3] Cognize Yourself: Graph Pre-Training via Core Graph Cognizing and Differentiating
    Yu, Tao
    Fu, Yao
    Hu, Linghui
    Wang, Huizhao
    Jiang, Weihao
    Pu, Shiliang
    [J]. PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, CIKM 2022, 2022, : 2413 - 2422
  • [4] Dictionary Temporal Graph Network via Pre-training Embedding Distillation
    Liu, Yipeng
    Zheng, Fang
    [J]. ADVANCED INTELLIGENT COMPUTING TECHNOLOGY AND APPLICATIONS, PT VI, ICIC 2024, 2024, 14880 : 336 - 347
  • [5] Pre-training via Paraphrasing
    Lewis, Mike
    Ghazvininejad, Marjan
    Ghosh, Gargi
    Aghajanyan, Armen
    Wang, Sida
    Zettlemoyer, Luke
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33
  • [6] Dynamic Scene Graph Generation via Temporal Prior Inference
    Wang, Shuang
    Gao, Lianli
    Lyu, Xinyu
    Guo, Yuyu
    Zeng, Pengpeng
    Song, Jingkuan
    [J]. PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022, 2022, : 5793 - 5801
  • [7] Graph Strength for Identification of Pre-training Desynchronization
    Zapata Castano, Frank Yesid
    Gomez Morales, Oscar Wladimir
    Alvarez Meza, Andres Marino
    Castellanos Dominguez, Cesar German
    [J]. INTELLIGENT TECHNOLOGIES: DESIGN AND APPLICATIONS FOR SOCIETY, CITIS 2022, 2023, 607 : 36 - 44
  • [8] Cross-Lingual Natural Language Generation via Pre-Training
    Chi, Zewen
    Dong, Li
    Wei, Furu
    Wang, Wenhui
    Mao, Xian-Ling
    Huang, Heyan
    [J]. THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 7570 - 7577
  • [9] PreSTU: Pre-Training for Scene-Text Understanding
    Kil, Jihyung
    Changpinyo, Soravit
    Chen, Xi
    Hu, Hexiang
    Goodman, Sebastian
    Chao, Wei-Lun
    Soricut, Radu
    [J]. 2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 15224 - 15234
  • [10] Unifying Event Detection and Captioning as Sequence Generation via Pre-training
    Zhang, Qi
    Song, Yuqing
    Jin, Qin
    [J]. COMPUTER VISION, ECCV 2022, PT XXXVI, 2022, 13696 : 363 - 379