Unsupervised Learning from Narrated Instruction Videos

被引:142
|
作者
Alayrac, Jean-Baptiste [1 ,2 ]
Bojanowski, Piotr [1 ]
Agrawal, Nishant [1 ,3 ]
Sivic, Josef [1 ]
Laptev, Ivan [1 ]
Lacoste-Julien, Simon [2 ]
机构
[1] Ecole Normale Super, CNRS, INRIA, WILLOW Project Team,Dept Informat,UMR 8548, Paris, France
[2] Ecole Normale Super, CNRS, INRIA, SIERRA Project Team,Dept Informat,UMR 8548, Paris, France
[3] IIIT Hyderabad, Hyderabad, Andhra Pradesh, India
基金
欧洲研究理事会;
关键词
D O I
10.1109/CVPR.2016.495
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We address the problem of automatically learning the main steps to complete a certain task, such as changing a car tire, from a set of narrated instruction videos. The contributions of this paper are three-fold. First, we develop a new unsupervised learning approach that takes advantage of the complementary nature of the input video and the associated narration. The method solves two clustering problems, one in text and one in video, applied one after each other and linked by joint constraints to obtain a single coherent sequence of steps in both modalities. Second, we collect and annotate a new challenging dataset of real-world instruction videos from the Internet. The dataset contains about 800,000 frames for five different tasks(1) that include complex interactions between people and objects, and are captured in a variety of indoor and outdoor settings. Third, we experimentally demonstrate that the proposed method can automatically discover, in an unsupervised manner, the main steps to achieve the task and locate the steps in the input videos.
引用
收藏
页码:4575 / 4583
页数:9
相关论文
共 50 条
  • [21] Unsupervised Contrastive Learning of Image Representations from Ultrasound Videos with Hard Negative Mining
    Basu, Soumen
    Singla, Somanshu
    Gupta, Mayank
    Rana, Pratyaksha
    Gupta, Pankaj
    Arora, Chetan
    MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION, MICCAI 2022, PT IV, 2022, 13434 : 423 - 433
  • [22] Depth Prediction without the Sensors: Leveraging Structure for Unsupervised Learning from Monocular Videos
    Casser, Vincent
    Pirk, Soeren
    Mahjourian, Reza
    Angelova, Anelia
    THIRTY-THIRD AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FIRST INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / NINTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2019, : 8001 - 8008
  • [23] Narrated animated solution videos in a mastery setting
    Schroeder, Noah
    Gladding, Gary
    Gutmann, Brianne
    Stelzer, Timothy
    PHYSICAL REVIEW SPECIAL TOPICS-PHYSICS EDUCATION RESEARCH, 2015, 11 (01):
  • [24] Using Multilayer Videos for Remote Learning: Videos of Session Guidance, Content Instruction, and Activity
    Chen, Li-Ting
    Liu, Leping
    Tretheway, Phillip
    COMPUTERS IN THE SCHOOLS, 2022, 38 (04) : 322 - 353
  • [25] VIDEOWHISPER: TOWARDS UNSUPERVISED LEARNING OF DISCRIMINATIVE FEATURES OF VIDEOS WITH RNN
    Zhao, Na
    Zhang, Hanwang
    Zhang, Mingxing
    Hong, Richang
    Wang, Meng
    Chua, Tat-seng
    2017 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME), 2017, : 277 - 282
  • [26] Unsupervised Learning of Long-Term Motion Dynamics for Videos
    Luo, Zelun
    Peng, Boya
    Huang, De-An
    Alahi, Alexandre
    Li Fei-Fei
    30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 7101 - 7110
  • [27] Unsupervised Learning of Geometry from Videos with Edge-Aware Depth-Normal Consistency
    Yang, Zhenheng
    Wang, Peng
    Xu, Wei
    Zhao, Liang
    Nevatia, Ramakant
    THIRTY-SECOND AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTIETH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / EIGHTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2018, : 7493 - 7500
  • [28] An Unsupervised Method for Anomaly Detection from Crowd Videos
    Guler, Puren
    Temizel, Alptekin
    Temizel, Tugba Taskaya
    2013 21ST SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE (SIU), 2013,
  • [29] Unsupervised Learning for Stereo Matching Using Single-View Videos
    Phuc Nguyen Hong
    Ahn, Chang Wook
    IEEE ACCESS, 2020, 8 (08): : 73804 - 73815
  • [30] Simple Unsupervised Object-Centric Learning for Complex and Naturalistic Videos
    Singh, Gautam
    Wu, Yi-Fu
    Ahn, Sungjin
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,