Integrating Pose and Mask Predictions for Multi-person in Videos

被引：1

作者：

Heo, Miran ^{[1
,2
]}

Hwang, Sukjun ^{[1
]}

Oh, Seoung Wug ^{[2
]}

Lee, Joon-Young ^{[2
]}

Kim, Seon Joo ^{[1
]}

机构：

[1] Yonsei Univ, Seoul, South Korea

[2] Adobe Res, San Jose, CA USA

来源：

2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS, CVPRW 2022 | 2022年

关键词：

D O I：

10.1109/CVPRW56347.2022.00299

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

In real-world applications for video editing, humans are arguably the most important objects. When editing videos of humans, the efficient tracking of fine-grained masks and body joints is the fundamental requirement. In this paper, we propose a simple and efficient system for jointly tracking pose and segmenting high-quality masks for all humans in the video. We design a pipeline that globally tracks pose and locally segments fine-grained masks. Specifically, CenterTrack is first employed to track human poses by viewing the whole scene, and then the proposed local segmentation network leverages the pose information as a powerful query to carry out high-quality segmentation. Furthermore, we adopt a highly light-weight MLP-Mixer layer within the segmentation network that can efficiently propagate the query pose throughout the region of interest with minimal overhead. For the evaluation, we collect a new benchmark called KineMask which includes various appearances and actions. The experimental results demonstrate that our method has superior fine-grained segmentation performance. Moreover, it runs at 33 fps, achieving a great balance of speed and accuracy compared to the prevailing online Video Instance Segmentation methods.

引用

页码：2656 / 2665

页数：10

共 50 条

[11] Multi-Domain Pose Network for Multi-Person Pose Estimation and Tracking
Guo, Hengkai
Tang, Tang
Luo, Guozhong
Chen, Riwei
Lu, Yongchen
Wen, Linfu
COMPUTER VISION - ECCV 2018 WORKSHOPS, PT II, 2019, 11130 : 209 - 216
[12] Graph and Temporal Convolutional Networks for 3D Multi-person Pose Estimation in Monocular Videos
Cheng, Yu
Wang, Bo
Yang, Bo
Tan, Robby T.
THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 1157 - 1165
[13] PoseDet: Fast Multi-Person Pose Estimation Using Pose Embedding
Tian, Chenyu
Yu, Ran
Zhao, Xinyuan
Xia, Weihao
Wang, Haoqian
Yang, Yujiu
2021 16TH IEEE INTERNATIONAL CONFERENCE ON AUTOMATIC FACE AND GESTURE RECOGNITION (FG 2021), 2021,
[14] Lite Hourglass Network for Multi-person Pose Estimation
Zhao, Ying
Luo, Zhiwei
Quan, Changqin
Liu, Dianchao
Wang, Gang
MULTIMEDIA MODELING (MMM 2020), PT II, 2020, 11962 : 226 - 238
[15] Integral Knowledge Distillation for Multi-Person Pose Estimation
Xu, Xixia
Zou, Qi
Lin, Xue
Huang, Yaping
Tian, Yi
IEEE SIGNAL PROCESSING LETTERS, 2020, 27 : 436 - 440
[16] A Gated Attention Transformer for Multi-Person Pose Tracking
Doering, Andreas
Gall, Juergen
2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS, ICCVW, 2023, : 3181 - 3190
[17] Multi-Person Pose Estimation Using Thermal Images
Chen, I-Chien
Wang, Chang-Jen
Wen, Chao-Kai
Tzou, Shiow-Jyu
IEEE ACCESS, 2020, 8 : 174964 - 174971
[18] Improving Multi-Person Pose Tracking With a Confidence Network
Fu, Zehua
Zuo, Wenhang
Hu, Zhenghui
Liu, Qingjie
Wang, Yunhong
IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 5223 - 5233
[19] Overcoming Data Deficiency for Multi-Person Pose Estimation
Dai, Yan
Wang, Xuanhan
Gao, Lianli
Song, Jingkuan
Zheng, Feng
Shen, Heng Tao
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024, 35 (08) : 10857 - 10868
[20] Detecting events and key actors in multi-person videos
Ramanathan, Vignesh
Huang, Jonathan
Abu-El-Haija, Sami
Gorban, Alexander
Murphy, Kevin
Li Fei-Fei
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, : 3043 - 3053

← 1 2 3 4 5 →