Spotting Temporally Precise, Fine-Grained Events in Video

被引:10
|
作者
Hong, James [1 ]
Zhang, Haotian [1 ]
Gharbi, Michael [2 ]
Fisher, Matthew [2 ]
Fatahalian, Kayvon [1 ]
机构
[1] Stanford Univ, Stanford, CA 94305 USA
[2] Adobe Res, San Francisco, CA USA
来源
基金
美国国家科学基金会;
关键词
Temporally precise spotting; Video understanding;
D O I
10.1007/978-3-031-19833-5_3
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We introduce the task of spotting temporally precise, fine-grained events in video (detecting the precise moment in time events occur). Precise spotting requires models to reason globally about the full-time scale of actions and locally to identify subtle frame-to-frame appearance and motion differences that identify events during these actions. Surprisingly, we find that top performing solutions to prior video understanding tasks such as action detection and segmentation do not simultaneously meet both requirements. In response, we propose E2E-Spot, a compact, end-to-end model that performs well on the precise spotting task and can be trained quickly on a single GPU. We demonstrate that E2E-Spot significantly outperforms recent baselines adapted from the video action detection, segmentation, and spotting literature to the precise spotting task. Finally, we contribute new annotations and splits to several fine-grained sports action datasets to make these datasets suitable for future work on precise spotting.
引用
收藏
页码:33 / 51
页数:19
相关论文
共 50 条
  • [21] Fine-Grained Object Detection of Satellite Video in the Frequency Domain
    Sun, Yuhan
    Li, Shengyang
    IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2025, 22
  • [22] Fine-Grained Classification of Pedestrians in Video: Benchmark and State of the Art
    Hall, David
    Perona, Pietro
    2015 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2015, : 5482 - 5491
  • [23] The fine-grained scalable video coding based on matching pursuits
    Lin, JL
    Hwang, WL
    Pei, SC
    2002 INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, VOL II, PROCEEDINGS, 2002, : 53 - 56
  • [24] Rate control for fully fine-grained scalable video coders
    Prades-Nebot, J
    Cook, GW
    Delp, EJ
    VISUAL COMMUNICATIONS AND IMAGE PROCESSING 2002, PTS 1 AND 2, 2002, 4671 : 828 - 839
  • [25] FuseFormer: Fusing Fine-Grained Information in Transformers for Video Inpainting
    Liu, Rui
    Deng, Hanming
    Huang, Yangyi
    Shi, Xiaoyu
    Lu, Lewei
    Sun, Wenxiu
    Wang, Xiaogang
    Dai, Jifeng
    Li, Hongsheng
    2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 14020 - 14029
  • [26] FineAction: A Fine-Grained Video Dataset for Temporal Action Localization
    Liu, Yi
    Wang, Limin
    Wang, Yali
    Ma, Xiao
    Qiao, Yu
    IEEE Transactions on Image Processing, 2022, 31 : 6937 - 6950
  • [27] iMakeup: Makeup Instructional Video Dataset for Fine-Grained Dense Video Captioning
    Lin X.
    Jin Q.
    Chen S.
    Jisuanji Fuzhu Sheji Yu Tuxingxue Xuebao/Journal of Computer-Aided Design and Computer Graphics, 2019, 31 (08): : 1350 - 1357
  • [28] iMakeup: Makeup Instructional Video Dataset for Fine-Grained Dense Video Captioning
    Lin, Xiaozhu
    Jin, Qin
    Chen, Shizhe
    Song, Yuqing
    Zhao, Yida
    ADVANCES IN MULTIMEDIA INFORMATION PROCESSING, PT III, 2018, 11166 : 78 - 88
  • [29] FineGym: A Hierarchical Video Dataset for Fine-grained Action Understanding
    Shao, Dian
    Zhao, Yue
    Dai, Bo
    Lin, Dahua
    2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, : 2613 - 2622
  • [30] Fine-grained scalable video broadcasting over cellular networks
    Liu, JC
    Li, B
    Li, B
    Cao, XR
    IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, VOL I AND II, PROCEEDINGS, 2002, : 417 - 420