Video Scene Parsing with Predictive Feature Learning

被引:66
|
作者
Jin, Xiaojie [1 ]
Li, Xin [2 ]
Xiao, Huaxin [2 ]
Shen, Xiaohui [3 ]
Lin, Zhe [3 ]
Yang, Jimei [3 ]
Chen, Yunpeng [2 ]
Dong, Jian [5 ]
Liu, Luoqi [4 ]
Jie, Zequn [4 ]
Feng, Jiashi [2 ]
Yan, Shuicheng [2 ,5 ]
机构
[1] NUS, NUS Grad Sch Integrat Sci & Engn NGS, Singapore, Singapore
[2] NUS, Dept ECE, Singapore, Singapore
[3] Adobe Res, San Jose, CA USA
[4] Tencent AI Lab, Seattle, WA USA
[5] 360 AI Inst, Ellicott City, MD USA
关键词
D O I
10.1109/ICCV.2017.595
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Video scene parsing is challenging due to the following two reasons: firstly, it is non-trivial to learn meaningful video representations for producing the temporally consistent labeling map; secondly, such a learning process becomes more difficult with insufficient labeled video training data. In this work, we propose a unified framework to address the above two problems, which is to our knowledge the first model to employ predictive feature learning in the video scene parsing. The predictive feature learning is carried out in two predictive tasks: frame prediction and predictive parsing. It is experimentally proved that the learned predictive features in our model are able to significantly enhance the video parsing performance by combining with the standard image parsing network. Interestingly, the performance gain brought by the predictive learning is almost costless as the features are learned from a large amount of unlabeled video data in an unsupervised way. Extensive experiments over two challenging datasets, Cityscapes and Camvid, have demonstrated the effectiveness of our model by showing remarkable improvement over well-established baselines.
引用
收藏
页码:5581 / 5589
页数:9
相关论文
共 50 条
  • [1] Video scene parsing: An overview of deep learning methods and datasets
    Yan, Xiyu
    Gong, Huihui
    Jiang, Yong
    Xia, Shu-Tao
    Zheng, Feng
    You, Xinge
    Shao, Ling
    COMPUTER VISION AND IMAGE UNDERSTANDING, 2020, 201
  • [2] Consensus Feature Network for Scene Parsing
    Wu, Tianyi
    Tang, Sheng
    Zhang, Rui
    Guo, Guodong
    IEEE TRANSACTIONS ON MULTIMEDIA, 2021, 24 : 3208 - 3217
  • [3] Feature boosting with efficient attention for scene parsing
    Singh, Vivek
    Sharma, Shailza
    Cuzzolin, Fabio
    NEUROCOMPUTING, 2024, 601
  • [4] On detection of gradual scene changes for parsing of video data
    Song, SMH
    Kwon, TH
    Kim, WM
    Kim, H
    Rhee, BD
    STORAGE AND RETRIEVAL FOR IMAGE AND VIDEO DATABASES VI, 1997, 3312 : 404 - 413
  • [5] Binary feature representation learning for scene retrieval in micro-video
    Jie Guo
    Xiushan Nie
    Muwei Jian
    Yilong Yin
    Multimedia Tools and Applications, 2019, 78 : 24539 - 24552
  • [6] Binary feature representation learning for scene retrieval in micro-video
    Guo, Jie
    Nie, Xiushan
    Jian, Muwei
    Yin, Yilong
    MULTIMEDIA TOOLS AND APPLICATIONS, 2019, 78 (17) : 24539 - 24552
  • [7] High Resolution Feature Recovering for Accelerating Urban Scene Parsing
    Zhang, Rui
    Tang, Sheng
    Liu, Luoqi
    Zhang, Yongdong
    Li, Jintao
    Yan, Shuicheng
    PROCEEDINGS OF THE TWENTY-SEVENTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2018, : 1156 - 1162
  • [8] Nonparametric scene parsing with adaptive feature relevance and semantic context
    Singh, Gautam
    Kosecka, Jana
    2013 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2013, : 3151 - 3157
  • [9] Weakly-Supervised Video Scene Co-parsing
    Zhong, Guangyu
    Tsai, Yi-Hsuan
    Yang, Ming-Hsuan
    COMPUTER VISION - ACCV 2016, PT I, 2017, 10111 : 20 - 36
  • [10] Feature context learning for human parsing
    Tengteng Huang
    Yongchao Xu
    Song Bai
    Yongpan Wang
    Xiang Bai
    Science China Information Sciences, 2019, 62