Unsupervised Learning from Narrated Instruction Videos

被引：142

作者：

Alayrac, Jean-Baptiste ^{[1
,2
]}

Bojanowski, Piotr ^{[1
]}

Agrawal, Nishant ^{[1
,3
]}

Sivic, Josef ^{[1
]}

Laptev, Ivan ^{[1
]}

Lacoste-Julien, Simon ^{[2
]}

机构：

[1] Ecole Normale Super, CNRS, INRIA, WILLOW Project Team,Dept Informat,UMR 8548, Paris, France

[2] Ecole Normale Super, CNRS, INRIA, SIERRA Project Team,Dept Informat,UMR 8548, Paris, France

[3] IIIT Hyderabad, Hyderabad, Andhra Pradesh, India

来源：

2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR) | 2016年

基金：

欧洲研究理事会;

关键词：

D O I：

10.1109/CVPR.2016.495

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We address the problem of automatically learning the main steps to complete a certain task, such as changing a car tire, from a set of narrated instruction videos. The contributions of this paper are three-fold. First, we develop a new unsupervised learning approach that takes advantage of the complementary nature of the input video and the associated narration. The method solves two clustering problems, one in text and one in video, applied one after each other and linked by joint constraints to obtain a single coherent sequence of steps in both modalities. Second, we collect and annotate a new challenging dataset of real-world instruction videos from the Internet. The dataset contains about 800,000 frames for five different tasks(1) that include complex interactions between people and objects, and are captured in a variety of indoor and outdoor settings. Third, we experimentally demonstrate that the proposed method can automatically discover, in an unsupervised manner, the main steps to achieve the task and locate the steps in the input videos.

引用

页码：4575 / 4583

页数：9

共 50 条

[21] Unsupervised Contrastive Learning of Image Representations from Ultrasound Videos with Hard Negative Mining
Basu, Soumen
Singla, Somanshu
Gupta, Mayank
Rana, Pratyaksha
Gupta, Pankaj
Arora, Chetan
MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION, MICCAI 2022, PT IV, 2022, 13434 : 423 - 433
[22] Depth Prediction without the Sensors: Leveraging Structure for Unsupervised Learning from Monocular Videos
Casser, Vincent
Pirk, Soeren
Mahjourian, Reza
Angelova, Anelia
THIRTY-THIRD AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FIRST INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / NINTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2019, : 8001 - 8008
[23] Narrated animated solution videos in a mastery setting
Schroeder, Noah
Gladding, Gary
Gutmann, Brianne
Stelzer, Timothy
PHYSICAL REVIEW SPECIAL TOPICS-PHYSICS EDUCATION RESEARCH, 2015, 11 (01):
[24] Using Multilayer Videos for Remote Learning: Videos of Session Guidance, Content Instruction, and Activity
Chen, Li-Ting
Liu, Leping
Tretheway, Phillip
COMPUTERS IN THE SCHOOLS, 2022, 38 (04) : 322 - 353
[25] VIDEOWHISPER: TOWARDS UNSUPERVISED LEARNING OF DISCRIMINATIVE FEATURES OF VIDEOS WITH RNN
Zhao, Na
Zhang, Hanwang
Zhang, Mingxing
Hong, Richang
Wang, Meng
Chua, Tat-seng
2017 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME), 2017, : 277 - 282
[26] Unsupervised Learning of Long-Term Motion Dynamics for Videos
Luo, Zelun
Peng, Boya
Huang, De-An
Alahi, Alexandre
Li Fei-Fei
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 7101 - 7110
[27] Unsupervised Learning of Geometry from Videos with Edge-Aware Depth-Normal Consistency
Yang, Zhenheng
Wang, Peng
Xu, Wei
Zhao, Liang
Nevatia, Ramakant
THIRTY-SECOND AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTIETH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / EIGHTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2018, : 7493 - 7500
[28] An Unsupervised Method for Anomaly Detection from Crowd Videos
Guler, Puren
Temizel, Alptekin
Temizel, Tugba Taskaya
2013 21ST SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE (SIU), 2013,
[29] Unsupervised Learning for Stereo Matching Using Single-View Videos
Phuc Nguyen Hong
Ahn, Chang Wook
IEEE ACCESS, 2020, 8 (08): : 73804 - 73815
[30] Simple Unsupervised Object-Centric Learning for Complex and Naturalistic Videos
Singh, Gautam
Wu, Yi-Fu
Ahn, Sungjin
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,

← 1 2 3 4 5 →