What if we do not have multiple videos of the same action? -Video Action Localization Using Web Images

被引:15
|
作者
Sultani, Waqas [1 ]
Shah, Mubarak [1 ]
机构
[1] Univ Cent Florida, CRCV, Orlando, FL 32816 USA
关键词
D O I
10.1109/CVPR.2016.122
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper tackles the problem of spatio-temporal action localization in a video, without assuming the availability of multiple videos or any prior annotations. Action is localized by employing images downloaded from internet using action name. Given web images, we first dampen image noise using random walk and evade distracting backgrounds within images using image action proposals. Then, given a video, we generate multiple spatio-temporal action proposals. We suppress camera and background generated proposals by exploiting optical flow gradients within proposals. To obtain the most action representative proposals, we propose to reconstruct action proposals in the video by leveraging the action proposals in images. Moreover, we preserve the temporal smoothness of the video and reconstruct all proposal bounding boxes jointly using the constraints that push the coefficients for each bounding box toward a common consensus, thus enforcing the coefficient similarity across multiple frames. We solve this optimization problem using variant of two-metric projection algorithm. Finally, the video proposal that has the lowest reconstruction cost and is motion salient is used to localize the action. Our method is not only applicable to the trimmed videos, but it can also be used for action localization in untrimmed videos, which is a very challenging problem. We present extensive experiments on trimmed as well as untrimmed datasets to validate the effectiveness of the proposed approach.
引用
收藏
页码:1077 / 1085
页数:9
相关论文
共 50 条
  • [1] Now that we have it, what do we do with it? Using the Web in the classroom
    Bridges, DL
    DeVaull, FL
    INTERVENTION IN SCHOOL AND CLINIC, 1999, 34 (03) : 181 - 187
  • [2] Automatic Construction of an Action Video Shot Database using Web Videos
    Nga, Do Hang
    Yanai, Keiji
    2011 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2011, : 527 - 534
  • [3] Weakly Supervised Action Recognition and Localization Using Web Images
    Liu, Cuiwei
    Wu, Xinxiao
    Jia, Yunde
    COMPUTER VISION - ACCV 2014, PT V, 2015, 9007 : 642 - 657
  • [4] AFFIRMATIVE-ACTION - DO WE HAVE IT
    WILLIAMS, CL
    JOURNAL OF REHABILITATION, 1976, 42 (03) : 2 - 2
  • [5] Temporal Action Localization in Untrimmed Videos Using Action Pattern Trees
    Song, Hao
    Wu, Xinxiao
    Zhu, Bing
    Wu, Yuwei
    Chen, Mei
    Jia, Yunde
    IEEE TRANSACTIONS ON MULTIMEDIA, 2019, 21 (03) : 717 - 730
  • [6] Affirmative action: What do we know?
    Holzer, HJ
    Neumark, D
    JOURNAL OF POLICY ANALYSIS AND MANAGEMENT, 2006, 25 (02) : 463 - 490
  • [7] Ten years of corporate action on climate change: What do we have to show for it?
    Sullivan, Rory
    Gouldson, Andy
    ENERGY POLICY, 2013, 60 : 733 - 740
  • [8] What we do: A nonreductive approach to human action
    Baker, LR
    HUMAN ACTION, DELIBERATION AND CAUSATION, 1998, 77 : 249 - 270
  • [9] Improving of Action Localization in Videos Using the Novel Feature Extraction
    Burdukovskaya, Galina
    Shadrin, Dmitrii
    Ovchinnikov, George
    Fedorov, Maxim
    PROCEEDINGS OF 2021 IEEE 30TH INTERNATIONAL SYMPOSIUM ON INDUSTRIAL ELECTRONICS (ISIE), 2021,
  • [10] Do intravenous and inhalational anesthetics have the same molecular sites of action?
    Harris, RA
    NEW BALANCED ANESTHESIA, 1998, 1164 : 11 - 18