Coarse-to-Fine Localization of Temporal Action Proposals

被引:24
|
作者
Long, Fuchen [1 ]
Yao, Ting [2 ]
Qiu, Zhaofan [1 ]
Tian, Xinmei [1 ]
Mei, Tao [2 ]
Luo, Jiebo [3 ]
机构
[1] Univ Sci & Technol China, Elect Engn & Informat Sci, Hefei 230027, Peoples R China
[2] JD AI Res, Vis & Multimedia Lab, Beijing 100105, Peoples R China
[3] Univ Rochester, Dept Comp Sci, Rochester, NY 14604 USA
关键词
Proposals; Videos; Painting; Brushes; Microsoft Windows; Task analysis; Feature extraction; Action Proposals; Action Recognition; Action Detection; Video Captioning;
D O I
10.1109/TMM.2019.2943204
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Localizing temporal action proposals from long videos is a fundamental challenge in video analysis (e.g., action detection and recognition or dense video captioning). Most existing approaches often overlook the hierarchical granularities of actions and thus fail to discriminate fine-grained action proposals (e.g., hand washing laundry or changing a tire in vehicle repair). In this paper, we propose a novel coarse-to-fine temporal proposal (CFTP) approach to localize temporal action proposals by exploring different action granularities. Our proposed CFTP consists of three stages: a coarse proposal network (CPN) to generate long action proposals, a temporal convolutional anchor network (CAN) to localize finer proposals, and a proposal reranking network (PRN) to further identify proposals from previous stages. Specifically, CPN explores three complementary actionness curves (namely pointwise, pairwise, and recurrent curves) that represent actions at different levels for generating coarse proposals, while CAN refines these proposals by a multiscale cascaded 1D-convolutional anchor network. In contrast to existing works, our coarse-to-fine approach can progressively localize fine-grained action proposals. We conduct extensive experiments on two action benchmarks (THUMOS14 and ActivityNet v1.3) and demonstrate the superior performance of our approach when compared to the state-of-the-art techniques on various video understanding tasks.
引用
收藏
页码:1577 / 1590
页数:14
相关论文
共 50 条
  • [1] Temporal Action Localization With Coarse-to-Fine Network
    Zhang, Min
    Hu, Haiyang
    Li, Zhongjin
    [J]. IEEE ACCESS, 2022, 10 : 96378 - 96387
  • [2] A Coarse-to-Fine Boundary Localization method for Naturalistic Driving Action Recognition
    Ding, Guanchen
    Han, Wenwei
    Wang, Chenglong
    Cui, Mingpeng
    Zhou, Lin
    Pan, Dianbo
    Wang, Jiayi
    Zhang, Junxi
    Chen, Zhenzhong
    [J]. 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS, CVPRW 2022, 2022, : 3233 - 3240
  • [3] Exploring Coarse-to-Fine Action Token Localization and Interaction for Fine-grained Video Action Recognition
    Sun, Baoli
    Ye, Xinchen
    Wang, Zhihui
    Li, Haojie
    Wang, Zhiyong
    [J]. PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 5070 - 5078
  • [4] A coarse-to-fine temporal action detection method combining light and heavy networks
    Zhao, Fan
    Wang, Wen
    Wu, Yu
    Wang, Kaixuan
    Kang, Xiaobing
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 82 (01) : 879 - 898
  • [5] Iris localization with dual coarse-to-fine strategy
    Feng, Xinhua
    Fang, Chi
    Ding, Xiaoqing
    Wu, Youshou
    [J]. 18TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION, VOL 4, PROCEEDINGS, 2006, : 553 - +
  • [6] A coarse-to-fine temporal action detection method combining light and heavy networks
    Fan Zhao
    Wen Wang
    Yu Wu
    Kaixuan Wang
    Xiaobing Kang
    [J]. Multimedia Tools and Applications, 2023, 82 : 879 - 898
  • [7] Learning Coarse and Fine Features for Precise Temporal Action Localization
    Kim, Ji-Hwan
    Heo, Jae-Pil
    [J]. IEEE ACCESS, 2019, 7 : 149797 - 149809
  • [8] Coarse-to-Fine Localization of Underwater Acoustic Communication Receivers
    He, Pan
    Shen, Lu
    Henson, Benjamin
    Zakharov, Yuriy, V
    [J]. SENSORS, 2022, 22 (18)
  • [9] Recursive Coarse-to-Fine Localization for Fast Object Detection
    Pedersoli, Marco
    Gonzalez, Jordi
    Bagdanov, Andrew D.
    Villanueva, Juan J.
    [J]. COMPUTER VISION - ECCV 2010, PT VI, 2010, 6316 : 280 - +
  • [10] EFFICIENT HUMAN ACTION DETECTION: A COARSE-TO-FINE STRATEGY
    Wu, Xian
    Lai, Jianhuang
    Chen, Xilin
    [J]. 2010 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, 2010, : 701 - 704