What if we do not have multiple videos of the same action? -Video Action Localization Using Web Images

被引：15

作者：

Sultani, Waqas ^{[1
]}

Shah, Mubarak ^{[1
]}

机构：

[1] Univ Cent Florida, CRCV, Orlando, FL 32816 USA

来源：

2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR) | 2016年

关键词：

D O I：

10.1109/CVPR.2016.122

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

This paper tackles the problem of spatio-temporal action localization in a video, without assuming the availability of multiple videos or any prior annotations. Action is localized by employing images downloaded from internet using action name. Given web images, we first dampen image noise using random walk and evade distracting backgrounds within images using image action proposals. Then, given a video, we generate multiple spatio-temporal action proposals. We suppress camera and background generated proposals by exploiting optical flow gradients within proposals. To obtain the most action representative proposals, we propose to reconstruct action proposals in the video by leveraging the action proposals in images. Moreover, we preserve the temporal smoothness of the video and reconstruct all proposal bounding boxes jointly using the constraints that push the coefficients for each bounding box toward a common consensus, thus enforcing the coefficient similarity across multiple frames. We solve this optimization problem using variant of two-metric projection algorithm. Finally, the video proposal that has the lowest reconstruction cost and is motion salient is used to localize the action. Our method is not only applicable to the trimmed videos, but it can also be used for action localization in untrimmed videos, which is a very challenging problem. We present extensive experiments on trimmed as well as untrimmed datasets to validate the effectiveness of the proposed approach.

引用

页码：1077 / 1085

页数：9

共 50 条

[1] Now that we have it, what do we do with it? Using the Web in the classroom
Bridges, DL
DeVaull, FL
INTERVENTION IN SCHOOL AND CLINIC, 1999, 34 (03) : 181 - 187
[2] Automatic Construction of an Action Video Shot Database using Web Videos
Nga, Do Hang
Yanai, Keiji
2011 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2011, : 527 - 534
[3] Weakly Supervised Action Recognition and Localization Using Web Images
Liu, Cuiwei
Wu, Xinxiao
Jia, Yunde
COMPUTER VISION - ACCV 2014, PT V, 2015, 9007 : 642 - 657
[4] AFFIRMATIVE-ACTION - DO WE HAVE IT
WILLIAMS, CL
JOURNAL OF REHABILITATION, 1976, 42 (03) : 2 - 2
[5] Temporal Action Localization in Untrimmed Videos Using Action Pattern Trees
Song, Hao
Wu, Xinxiao
Zhu, Bing
Wu, Yuwei
Chen, Mei
Jia, Yunde
IEEE TRANSACTIONS ON MULTIMEDIA, 2019, 21 (03) : 717 - 730
[6] Affirmative action: What do we know?
Holzer, HJ
Neumark, D
JOURNAL OF POLICY ANALYSIS AND MANAGEMENT, 2006, 25 (02) : 463 - 490
[7] Ten years of corporate action on climate change: What do we have to show for it?
Sullivan, Rory
Gouldson, Andy
ENERGY POLICY, 2013, 60 : 733 - 740
[8] What we do: A nonreductive approach to human action
Baker, LR
HUMAN ACTION, DELIBERATION AND CAUSATION, 1998, 77 : 249 - 270
[9] Improving of Action Localization in Videos Using the Novel Feature Extraction
Burdukovskaya, Galina
Shadrin, Dmitrii
Ovchinnikov, George
Fedorov, Maxim
PROCEEDINGS OF 2021 IEEE 30TH INTERNATIONAL SYMPOSIUM ON INDUSTRIAL ELECTRONICS (ISIE), 2021,
[10] Do intravenous and inhalational anesthetics have the same molecular sites of action?
Harris, RA
NEW BALANCED ANESTHESIA, 1998, 1164 : 11 - 18

← 1 2 3 4 5 →