Cross-task weakly supervised learning from instructional videos

被引:78
|
作者
Zhukov, Dimitri [1 ,2 ]
Alayrac, Jean-Baptiste [1 ,3 ]
Cinbis, Ramazan Gokberk [4 ]
Fouhey, David [5 ]
Laptev, Ivan [1 ,2 ]
Sivic, Josef [1 ,2 ,6 ]
机构
[1] Inria, Rocquencourt, France
[2] PSL Res Univ, Ecole Normale Super, Dept Informat, Paris, France
[3] DeepMind, London, England
[4] Middle East Tech Univ, Ankara, Turkey
[5] Univ Michigan, Ann Arbor, MI 48109 USA
[6] Czech Tech Univ, CIIRC Czech Inst Informat Robot & Cybernet, Prague, Czech Republic
基金
欧洲研究理事会;
关键词
D O I
10.1109/CVPR.2019.00365
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper we investigate learning visual models for the steps of ordinary tasks using weak supervision via instructional narrations and an ordered list of steps instead of strong supervision via temporal annotations. At the heart of our approach is the observation that weakly supervised learning may be easier if a model shares components while learning different steps: "pour egg" should be trained jointly with other tasks involving "pour" and "egg". We formalize this in a component model for recognizing steps and a weakly supervised learning framework that can learn this model under temporal constraints from narration and the list of steps. Past data does not permit systematic studying of sharing and so we also gather a new dataset, CrossTask, aimed at assessing cross-task sharing. Our experiments demonstrate that sharing across tasks improves performance, especially when done at the component level and that our component model can parse previously unseen tasks by virtue of its compositionality.
引用
下载
收藏
页码:3532 / 3540
页数:9
相关论文
共 50 条
  • [1] Semi-Weakly-Supervised Learning of Complex Actions from Instructional Task Videos
    Shen, Yuhan
    Elhamifar, Ehsan
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 3334 - 3344
  • [2] Learning class-agnostic masks with cross-task refinement for weakly supervised semantic segmentation
    Xu, Lian
    Bennamoun, Mohammed
    Boussaid, Farid
    Ouyang, Wanli
    Xu, Dan
    NEURAL COMPUTING & APPLICATIONS, 2023, 35 (27): : 20189 - 20205
  • [3] Learning class-agnostic masks with cross-task refinement for weakly supervised semantic segmentation
    Lian Xu
    Mohammed Bennamoun
    Farid Boussaid
    Wanli Ouyang
    Dan Xu
    Neural Computing and Applications, 2023, 35 : 20189 - 20205
  • [4] Cross-epoch learning for weakly supervised anomaly detection in surveillance videos
    Yu, Shenghao
    Wang, Chong
    Mao, Qiaomei
    Li, Yuqi
    Wu, Jiafei
    IEEE Signal Processing Letters, 2021, 28 : 2137 - 2141
  • [5] Cross-Epoch Learning for Weakly Supervised Anomaly Detection in Surveillance Videos
    Yu, Shenghao
    Wang, Chong
    Mao, Qiaomei
    Li, Yuqi
    Wu, Jiafei
    IEEE SIGNAL PROCESSING LETTERS, 2021, 28 : 2137 - 2141
  • [6] Weakly Supervised Learning of Heterogeneous Concepts in Videos
    Shah, Sohil
    Kulkarni, Kuldeep
    Biswas, Arijit
    Gandhi, Ankit
    Deshmukh, Om
    Davis, Larry S.
    COMPUTER VISION - ECCV 2016, PT VI, 2016, 9910 : 275 - 293
  • [7] Weakly-Supervised Action Learning in Procedural Task Videos via Process Knowledge Decomposition
    Zou, Minghao
    Zeng, Qingtian
    Zhang, Xue
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (07) : 5575 - 5588
  • [8] Cross-Task Crowdsourcing
    Mo, Kaixiang
    Zhong, Erheng
    Yang, Qiang
    19TH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING (KDD'13), 2013, : 677 - 685
  • [9] Cross-task Attention Mechanism for Dense Multi-task Learning
    Lopes, Ivan
    Tuan-Hung Vu
    de Charette, Raoul
    2023 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2023, : 2328 - 2337
  • [10] Learning cross-task relations for panoptic driving perception
    Song, Zhanjie
    Zhao, Linqing
    PATTERN RECOGNITION LETTERS, 2023, 176 : 89 - 95