Video Scene Parsing with Predictive Feature Learning

被引:66
|
作者
Jin, Xiaojie [1 ]
Li, Xin [2 ]
Xiao, Huaxin [2 ]
Shen, Xiaohui [3 ]
Lin, Zhe [3 ]
Yang, Jimei [3 ]
Chen, Yunpeng [2 ]
Dong, Jian [5 ]
Liu, Luoqi [4 ]
Jie, Zequn [4 ]
Feng, Jiashi [2 ]
Yan, Shuicheng [2 ,5 ]
机构
[1] NUS, NUS Grad Sch Integrat Sci & Engn NGS, Singapore, Singapore
[2] NUS, Dept ECE, Singapore, Singapore
[3] Adobe Res, San Jose, CA USA
[4] Tencent AI Lab, Seattle, WA USA
[5] 360 AI Inst, Ellicott City, MD USA
关键词
D O I
10.1109/ICCV.2017.595
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Video scene parsing is challenging due to the following two reasons: firstly, it is non-trivial to learn meaningful video representations for producing the temporally consistent labeling map; secondly, such a learning process becomes more difficult with insufficient labeled video training data. In this work, we propose a unified framework to address the above two problems, which is to our knowledge the first model to employ predictive feature learning in the video scene parsing. The predictive feature learning is carried out in two predictive tasks: frame prediction and predictive parsing. It is experimentally proved that the learned predictive features in our model are able to significantly enhance the video parsing performance by combining with the standard image parsing network. Interestingly, the performance gain brought by the predictive learning is almost costless as the features are learned from a large amount of unlabeled video data in an unsupervised way. Extensive experiments over two challenging datasets, Cityscapes and Camvid, have demonstrated the effectiveness of our model by showing remarkable improvement over well-established baselines.
引用
收藏
页码:5581 / 5589
页数:9
相关论文
共 50 条
  • [41] Scene-consistent detection of feature points in video sequences
    Tankus, A
    Yeshurun, Y
    COMPUTER VISION AND IMAGE UNDERSTANDING, 2005, 97 (01) : 1 - 29
  • [42] Open Vocabulary Scene Parsing
    Zhao, Hang
    Puig, Xavier
    Zhou, Bolei
    Fidler, Sanja
    Torralba, Antonio
    2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, : 2021 - 2029
  • [43] Scene-consistent detection of feature points in video sequences
    Tankus, A
    Yeshurun, Y
    2001 IEEE COMPUTER SOCIETY CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, VOL 1, PROCEEDINGS, 2001, : 631 - 638
  • [44] Scene and Texture Based Feature Set for DeepFake Video Detection
    Ramkissoon, Amit Neil
    Rajamanickam, Vijayanandh
    Goodridge, Wayne
    2022 IEEE INTERNATIONAL CONFERENCE ON DATA MINING WORKSHOPS, ICDMW, 2022, : 90 - 97
  • [45] Weakly Supervised Scene Parsing with Point-Based Distance Metric Learning
    Qian, Rui
    Wei, Yunchao
    Shi, Honghui
    Li, Jiachen
    Liu, Jiaying
    Huang, Thomas
    THIRTY-THIRD AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FIRST INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / NINTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2019, : 8843 - 8850
  • [46] Dense Segmentation Techniques Using Deep Learning for Urban Scene Parsing: A Review
    Ankareddy, Rajesh
    Delhibabu, Radhakrishnan
    IEEE ACCESS, 2025, 13 : 34496 - 34517
  • [47] Multimodal Feature Learning for Video Captioning
    Lee, Sujin
    Kim, Incheol
    MATHEMATICAL PROBLEMS IN ENGINEERING, 2018, 2018
  • [48] Adaptive Feature Learning for Unbiased Scene Graph Generation
    Yang, Jiarui
    Wang, Chuan
    Yang, Liang
    Jiang, Yuchen
    Cao, Angelina
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2024, 33 : 2252 - 2265
  • [49] Synthetically Supervised Feature Learning for Scene Text Recognition
    Liu, Yang
    Wang, Zhaowen
    Jin, Hailin
    Wassell, Ian
    COMPUTER VISION - ECCV 2018, PT V, 2018, 11209 : 449 - 465
  • [50] Deep feature extraction and motion representation for satellite video scene classification
    Yanfeng GU
    Huan LIU
    Tengfei WANG
    Shengyang LI
    Guoming GAO
    Science China(Information Sciences), 2020, 63 (04) : 97 - 111