FINE-GRAINED POSE TEMPORAL MEMORY MODULE FOR VIDEO POSE ESTIMATION AND TRACKING

被引：0

作者：

Wang, Chaoyi ^{[1
]}

Hua, Yang ^{[2
]}

Song, Tao ^{[1
]}

Xue, Zhengui ^{[1
]}

Ma, Ruhui ^{[1
]}

Robertson, Neil ^{[2
]}

Guan, Haibing ^{[1
]}

机构：

[1] Shanghai Jiao Tong Univ, Shanghai, Peoples R China

[2] Queens Univ Belfast, Belfast, Antrim, North Ireland

来源：

2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021) | 2021年

关键词：

video pose estimation and tracking; keypoint occlusion;

D O I：

10.1109/ICASSP39728.2021.9413650

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

The task of video pose estimation and tracking has been largely improved with the development of image pose estimation recently. However, there are still many challenging cases, such as body part occlusion, fast body motion, camera zooming, and complex background. Most existing methods generally use the temporal information to get more precise human bounding boxes or just use it in the tracking stage, but they fail to improve the accuracy of pose estimation tasks. To better solve these problems and utilize the temporal information efficiently and effectively, we present a novel structure, called pose temporal memory module, which is flexible to be transferred into top-down pose estimation frameworks. The temporal information stored in the pose temporal memory is aggregated into the current frame feature in our proposed module. We also transfer compositional de-attention (CoDA) to solve the unique keypoint occlusion problem in this task and propose a novel keypoint feature replacement to recover the extreme error detection under fine-grained keypoint-level guidance. To verify the generality and effectiveness of our proposed method, we integrate our module into two widely used pose estimation frameworks and obtain notable improvement on the PoseTrack dataset with only a few extra computing resources.

引用

页码：2205 / 2209

页数：5

共 50 条

[41] Hands-on: deformable pose and motion models for spatiotemporal localization of fine-grained dyadic interactions
van Gemeren, Coert
Poppe, Ronald
Veltkamp, Remco C.
EURASIP JOURNAL ON IMAGE AND VIDEO PROCESSING, 2018,
[42] A Fine-Grained Spatial-Temporal Attention Model for Video Captioning
Liu, An-An
Qiu, Yurui
Wong, Yongkang
Su, Yu-Ting
Kankanhalli, Mohan
IEEE ACCESS, 2018, 6 : 68463 - 68471
[43] Pose estimation for swimmers in video surveillance
Cao, Xiaowen
Yan, Wei Qi
MULTIMEDIA TOOLS AND APPLICATIONS, 2024, 83 (09) : 26565 - 26580
[44] ViSiL: Fine-grained Spatio-Temporal Video Similarity Learning
Kordopatis-Zilos, Giorgos
Papadopoulos, Symeon
Patras, Ioannis
Kompatsiaris, Ioannis
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 6360 - 6369
[45] Pose estimation for swimmers in video surveillance
Xiaowen Cao
Wei Qi Yan
Multimedia Tools and Applications, 2024, 83 (9) : 26565 - 26580
[46] Personalizing Human Video Pose Estimation
Charles, James
Pfister, Tomas
Magee, Derek
Hogg, David
Zisserman, Andrew
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, : 3063 - 3072
[47] Conditional Video Diffusion Network for Fine-Grained Temporal Sentence Grounding
Liu, Daizong
Zhu, Jiahao
Fang, Xiang
Xiong, Zeyu
Wang, Huan
Li, Renfu
Zhou, Pan
IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 5461 - 5476
[48] Fine-Grained Population Estimation
Bast, Hannah
Storandt, Sabine
Weidner, Simon
23RD ACM SIGSPATIAL INTERNATIONAL CONFERENCE ON ADVANCES IN GEOGRAPHIC INFORMATION SYSTEMS (ACM SIGSPATIAL GIS 2015), 2015,
[49] Active Learning for Human Pose Estimation based on Temporal Pose Continuity
Mori, Taro
Deguchi, Daisuke
Kawanishi, Yasutomo
Ide, Ichiro
Murase, Hiroshi
Inoshita, Tetsuo
INTERNATIONAL WORKSHOP ON ADVANCED IMAGING TECHNOLOGY (IWAIT) 2022, 2022, 12177
[50] CTM: Cross-time temporal module for fine-grained action recognition
Qian, Huifang
Zhang, Jialun
Yi, Jianping
Shi, Zhenyu
Zhang, Yimin
COMPUTER VISION AND IMAGE UNDERSTANDING, 2024, 244

← 1 2 3 4 5 →