FINE-GRAINED POSE TEMPORAL MEMORY MODULE FOR VIDEO POSE ESTIMATION AND TRACKING

被引:0
|
作者
Wang, Chaoyi [1 ]
Hua, Yang [2 ]
Song, Tao [1 ]
Xue, Zhengui [1 ]
Ma, Ruhui [1 ]
Robertson, Neil [2 ]
Guan, Haibing [1 ]
机构
[1] Shanghai Jiao Tong Univ, Shanghai, Peoples R China
[2] Queens Univ Belfast, Belfast, Antrim, North Ireland
关键词
video pose estimation and tracking; keypoint occlusion;
D O I
10.1109/ICASSP39728.2021.9413650
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
The task of video pose estimation and tracking has been largely improved with the development of image pose estimation recently. However, there are still many challenging cases, such as body part occlusion, fast body motion, camera zooming, and complex background. Most existing methods generally use the temporal information to get more precise human bounding boxes or just use it in the tracking stage, but they fail to improve the accuracy of pose estimation tasks. To better solve these problems and utilize the temporal information efficiently and effectively, we present a novel structure, called pose temporal memory module, which is flexible to be transferred into top-down pose estimation frameworks. The temporal information stored in the pose temporal memory is aggregated into the current frame feature in our proposed module. We also transfer compositional de-attention (CoDA) to solve the unique keypoint occlusion problem in this task and propose a novel keypoint feature replacement to recover the extreme error detection under fine-grained keypoint-level guidance. To verify the generality and effectiveness of our proposed method, we integrate our module into two widely used pose estimation frameworks and obtain notable improvement on the PoseTrack dataset with only a few extra computing resources.
引用
收藏
页码:2205 / 2209
页数:5
相关论文
共 50 条
  • [41] Hands-on: deformable pose and motion models for spatiotemporal localization of fine-grained dyadic interactions
    van Gemeren, Coert
    Poppe, Ronald
    Veltkamp, Remco C.
    EURASIP JOURNAL ON IMAGE AND VIDEO PROCESSING, 2018,
  • [42] A Fine-Grained Spatial-Temporal Attention Model for Video Captioning
    Liu, An-An
    Qiu, Yurui
    Wong, Yongkang
    Su, Yu-Ting
    Kankanhalli, Mohan
    IEEE ACCESS, 2018, 6 : 68463 - 68471
  • [43] Pose estimation for swimmers in video surveillance
    Cao, Xiaowen
    Yan, Wei Qi
    MULTIMEDIA TOOLS AND APPLICATIONS, 2024, 83 (09) : 26565 - 26580
  • [44] ViSiL: Fine-grained Spatio-Temporal Video Similarity Learning
    Kordopatis-Zilos, Giorgos
    Papadopoulos, Symeon
    Patras, Ioannis
    Kompatsiaris, Ioannis
    2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 6360 - 6369
  • [45] Pose estimation for swimmers in video surveillance
    Xiaowen Cao
    Wei Qi Yan
    Multimedia Tools and Applications, 2024, 83 (9) : 26565 - 26580
  • [46] Personalizing Human Video Pose Estimation
    Charles, James
    Pfister, Tomas
    Magee, Derek
    Hogg, David
    Zisserman, Andrew
    2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, : 3063 - 3072
  • [47] Conditional Video Diffusion Network for Fine-Grained Temporal Sentence Grounding
    Liu, Daizong
    Zhu, Jiahao
    Fang, Xiang
    Xiong, Zeyu
    Wang, Huan
    Li, Renfu
    Zhou, Pan
    IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 5461 - 5476
  • [48] Fine-Grained Population Estimation
    Bast, Hannah
    Storandt, Sabine
    Weidner, Simon
    23RD ACM SIGSPATIAL INTERNATIONAL CONFERENCE ON ADVANCES IN GEOGRAPHIC INFORMATION SYSTEMS (ACM SIGSPATIAL GIS 2015), 2015,
  • [49] Active Learning for Human Pose Estimation based on Temporal Pose Continuity
    Mori, Taro
    Deguchi, Daisuke
    Kawanishi, Yasutomo
    Ide, Ichiro
    Murase, Hiroshi
    Inoshita, Tetsuo
    INTERNATIONAL WORKSHOP ON ADVANCED IMAGING TECHNOLOGY (IWAIT) 2022, 2022, 12177
  • [50] CTM: Cross-time temporal module for fine-grained action recognition
    Qian, Huifang
    Zhang, Jialun
    Yi, Jianping
    Shi, Zhenyu
    Zhang, Yimin
    COMPUTER VISION AND IMAGE UNDERSTANDING, 2024, 244