Temporal Localization of Fine-Grained Actions in Videos by Domain Transfer from Web Images

被引:81
|
作者
Sun, Chen [1 ]
Shetty, Sanketh [2 ]
Sukthankar, Rahul [2 ]
Nevatia, Ram [1 ]
机构
[1] Univ Southern Calif, Los Angeles, CA 90089 USA
[2] Google Inc, Mountain View, CA 94043 USA
关键词
Fine-grained action localization; domain transfer; LSTM;
D O I
10.1145/2733373.2806226
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
We address the problem of fine-grained action localization from temporally untrimmed web videos. We assume that only weak video-level annotations are available for training. The goal is to use these weak labels to identify temporal segments corresponding to the actions, and learn models that generalize to unconstrained web videos. We find that web images queried by action names serve as well-localized highlights for many actions, but are noisily labeled. To solve this problem, we propose a simple yet effective method that takes weak video labels and noisy image labels as input, and generates localized action frames as output. This is achieved by cross-domain transfer between video frames and web images, using pre-trained deep convolutional neural networks. We then use the localized action frames to train action recognition models with long short-term memory networks. We collect a fine-grained sports action data set FGA-240 of more than 130,000 YouTube videos. It has 240 fine-grained actions under 85 sports activities. Convincing results are shown on the FGA-240 data set, as well as the THUMOS 2014 localization data set with untrimmed training videos.
引用
收藏
页码:371 / 380
页数:10
相关论文
共 50 条
  • [1] Fine-grained Iterative Attention Network for Temporal Language Localization in Videos
    Qu, Xiaoye
    Tang, Pengwei
    Zou, Zhikang
    Cheng, Yu
    Dong, Jianfeng
    Zhou, Pan
    Xu, Zichuan
    MM '20: PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, 2020, : 4280 - 4288
  • [2] Weakly-Supervised Temporal Action Detection for Fine-Grained Videos with Hierarchical Atomic Actions
    Li, Zhi
    He, Lu
    Xu, Huijuan
    COMPUTER VISION, ECCV 2022, PT X, 2022, 13670 : 567 - 584
  • [3] Fine-Grained Scalable Streaming from Coarse-Grained Videos
    Ni, Pengpeng
    Eichhorn, Alexander
    Griwodz, Carsten
    Halvorsen, Pal
    NOSSDAV 09: 18TH INTERNATIONAL WORKSHOP ON NETWORK AND OPERATING SYSTEMS SUPPORT FOR DIGITAL AUDIO AND VIDEO, 2009, : 103 - 108
  • [4] FineAction: A Fine-Grained Video Dataset for Temporal Action Localization
    Liu, Yi
    Wang, Limin
    Wang, Yali
    Ma, Xiao
    Qiao, Yu
    IEEE Transactions on Image Processing, 2022, 31 : 6937 - 6950
  • [5] FineAction: A Fine-Grained Video Dataset for Temporal Action Localization
    Liu, Yi
    Wang, Limin
    Wang, Yali
    Ma, Xiao
    Qiao, Yu
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2022, 31 : 6937 - 6950
  • [6] Domain Adaption for Fine-Grained Urban Village Extraction From Satellite Images
    Shi, Qian
    Liu, Mengxi
    Liu, Xiaoping
    Liu, Penghua
    Zhang, Pengyuan
    Yang, Jinxing
    Li, Xia
    IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2020, 17 (08) : 1430 - 1434
  • [7] Fine-grained recognition of plants from images
    Sulc, Milan
    Matas, Jiri
    PLANT METHODS, 2017, 13
  • [8] Fine-grained recognition of plants from images
    Milan Šulc
    Jiří Matas
    Plant Methods, 13
  • [9] INDIVIDUATING ACTIONS - THE FINE-GRAINED APPROACH
    MCCANN, HJ
    CANADIAN JOURNAL OF PHILOSOPHY, 1983, 13 (04) : 493 - 512
  • [10] Fine-grained Activity Recognition in Baseball Videos
    Piergiovanni, A. J.
    Ryoo, Michael S.
    PROCEEDINGS 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS (CVPRW), 2018, : 1821 - 1829