COIN: A Large-scale Dataset for Comprehensive Instructional Video Analysis

被引:104
|
作者
Tang, Yansong [1 ]
Ding, Dajun [2 ]
Rao, Yongming [1 ]
Zheng, Yu [1 ]
Zhang, Danyang [1 ]
Zhao, Lili [2 ]
Lu, Jiwen [1 ]
Zhou, Jie [1 ]
机构
[1] Tsinghua Univ, Dept Automat, Beijing, Peoples R China
[2] Meitu Inc, Xiamen, Fujian, Peoples R China
基金
中国国家自然科学基金;
关键词
RECOGNITION;
D O I
10.1109/CVPR.2019.00130
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
There are substantial instructional videos on the Internet, which enables us to acquire knowledge for completing various tasks. However, most existing datasets for instructional video analysis have the limitations in diversity and scale, which makes them far from many real-world applications where more diverse activities occur. Moreover, it still remains a great challenge to organize and harness such data. To address these problems, we introduce a large-scale dataset called "COIN" for COmprehensive INstructional video analysis. Organized with a hierarchical structure, the COIN dataset contains 11,827 videos of 180 tasks in 12 domains (e.g., vehicles, gadgets, etc.) related to our daily life. With a new developed toolbox, all the videos are annotated effectively with a series of step descriptions and the corresponding temporal boundaries. Furthermore, we propose a simple yet effective method to capture the dependencies among different steps, which can be easily plugged into conventional proposal-based action detection methods for localizing important steps in instructional videos. In order to provide a benchmark for instructional video analysis, we evaluate plenty of approaches on the COIN dataset under different evaluation criteria. We expect the introduction of the COIN dataset will promote the future in-depth research on instructional video analysis for the community.
引用
收藏
页码:1207 / 1216
页数:10
相关论文
共 50 条
  • [1] Comprehensive Instructional Video Analysis: The COIN Dataset and Performance Evaluation
    Tang, Yansong
    Lu, Jiwen
    Zhou, Jie
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2021, 43 (09) : 3138 - 3153
  • [2] The Jester Dataset: A Large-Scale Video Dataset of Human Gestures
    Materzynska, Joanna
    Berger, Guillaume
    Bax, Ingo
    Memisevic, Roland
    [J]. 2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW), 2019, : 2874 - 2882
  • [3] VSPW: A Large-scale Dataset for Video Scene Parsing in the Wild
    Miao, Jiaxu
    Wei, Yunchao
    Wu, Yu
    Liang, Chen
    Li, Guangrui
    Yang, Yi
    [J]. 2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 4131 - 4141
  • [4] A Large-scale Benchmark Dataset for Event Recognition in Surveillance Video
    Oh, Sangmin
    Hoogs, Anthony
    Perera, Amitha
    Cuntoor, Naresh
    Chen, Chia-Chih
    Lee, Jong Taek
    Mukherjee, Saurajit
    Aggarwal, J. K.
    Lee, Hyungtae
    Davis, Larry
    Swears, Eran
    Wang, Xioyang
    Ji, Qiang
    Reddy, Kishore
    Shah, Mubarak
    Vondrick, Carl
    Pirsiavash, Hamed
    Ramanan, Deva
    Yuen, Jenny
    Torralba, Antonio
    Song, Bi
    Fong, Anesco
    Roy-Chowdhury, Amit
    Desai, Mita
    [J]. 2011 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2011,
  • [5] A Large-scale TV Dataset for Partial Video Copy Detection
    Van-Hao Le
    Delalandre, Mathieu
    Conte, Donatello
    [J]. IMAGE ANALYSIS AND PROCESSING, ICIAP 2022, PT III, 2022, 13233 : 388 - 399
  • [6] Large-Scale Analysis of the Docker Hub Dataset
    Zhao, Nannan
    Tarasov, Vasily
    Albahar, Hadeel
    Anwar, Ali
    Rupprecht, Lukas
    Skourtis, Dimitrios
    Warke, Amit S.
    Mohamed, Mohamed
    Butt, Ali R.
    [J]. 2019 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING (CLUSTER), 2019, : 215 - 224
  • [7] Show Me a Video: A Large-Scale Narrated Video Dataset for Coherent Story Illustration
    Lu, Yu
    Ni, Feiyue
    Wang, Haofan
    Guo, Xiaofeng
    Zhu, Linchao
    Yang, Zongxin
    Song, Ruihua
    Cheng, Lele
    Yang, Yi
    [J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 2456 - 2466
  • [8] SVD: A Large-Scale Short Video Dataset for Near-Duplicate Video Retrieval
    Jiang, Qing-Yuan
    He, Yi
    Li, Gen
    Lin, Jian
    Li, Lei
    Li, Wu-Jun
    [J]. 2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 5280 - 5288
  • [9] MEVA: A Large-Scale Multiview, Multimodal Video Dataset for Activity Detection
    Corona, Kellie
    Osterdahl, Katie
    Collins, Roderic
    Hoogs, Anthony
    [J]. 2021 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV 2021), 2021, : 1059 - 1067
  • [10] LDPolypVideo Benchmark: A Large-Scale Colonoscopy Video Dataset of Diverse Polyps
    Ma, Yiting
    Chen, Xuejin
    Cheng, Kai
    Li, Yang
    Sun, Bin
    [J]. MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION - MICCAI 2021, PT V, 2021, 12905 : 387 - 396