Integrating Pose and Mask Predictions for Multi-person in Videos

被引:1
|
作者
Heo, Miran [1 ,2 ]
Hwang, Sukjun [1 ]
Oh, Seoung Wug [2 ]
Lee, Joon-Young [2 ]
Kim, Seon Joo [1 ]
机构
[1] Yonsei Univ, Seoul, South Korea
[2] Adobe Res, San Jose, CA USA
来源
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS, CVPRW 2022 | 2022年
关键词
D O I
10.1109/CVPRW56347.2022.00299
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
In real-world applications for video editing, humans are arguably the most important objects. When editing videos of humans, the efficient tracking of fine-grained masks and body joints is the fundamental requirement. In this paper, we propose a simple and efficient system for jointly tracking pose and segmenting high-quality masks for all humans in the video. We design a pipeline that globally tracks pose and locally segments fine-grained masks. Specifically, CenterTrack is first employed to track human poses by viewing the whole scene, and then the proposed local segmentation network leverages the pose information as a powerful query to carry out high-quality segmentation. Furthermore, we adopt a highly light-weight MLP-Mixer layer within the segmentation network that can efficiently propagate the query pose throughout the region of interest with minimal overhead. For the evaluation, we collect a new benchmark called KineMask which includes various appearances and actions. The experimental results demonstrate that our method has superior fine-grained segmentation performance. Moreover, it runs at 33 fps, achieving a great balance of speed and accuracy compared to the prevailing online Video Instance Segmentation methods.
引用
收藏
页码:2656 / 2665
页数:10
相关论文
共 50 条
  • [11] Multi-Domain Pose Network for Multi-Person Pose Estimation and Tracking
    Guo, Hengkai
    Tang, Tang
    Luo, Guozhong
    Chen, Riwei
    Lu, Yongchen
    Wen, Linfu
    COMPUTER VISION - ECCV 2018 WORKSHOPS, PT II, 2019, 11130 : 209 - 216
  • [12] Graph and Temporal Convolutional Networks for 3D Multi-person Pose Estimation in Monocular Videos
    Cheng, Yu
    Wang, Bo
    Yang, Bo
    Tan, Robby T.
    THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 1157 - 1165
  • [13] PoseDet: Fast Multi-Person Pose Estimation Using Pose Embedding
    Tian, Chenyu
    Yu, Ran
    Zhao, Xinyuan
    Xia, Weihao
    Wang, Haoqian
    Yang, Yujiu
    2021 16TH IEEE INTERNATIONAL CONFERENCE ON AUTOMATIC FACE AND GESTURE RECOGNITION (FG 2021), 2021,
  • [14] Lite Hourglass Network for Multi-person Pose Estimation
    Zhao, Ying
    Luo, Zhiwei
    Quan, Changqin
    Liu, Dianchao
    Wang, Gang
    MULTIMEDIA MODELING (MMM 2020), PT II, 2020, 11962 : 226 - 238
  • [15] Integral Knowledge Distillation for Multi-Person Pose Estimation
    Xu, Xixia
    Zou, Qi
    Lin, Xue
    Huang, Yaping
    Tian, Yi
    IEEE SIGNAL PROCESSING LETTERS, 2020, 27 : 436 - 440
  • [16] A Gated Attention Transformer for Multi-Person Pose Tracking
    Doering, Andreas
    Gall, Juergen
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS, ICCVW, 2023, : 3181 - 3190
  • [17] Multi-Person Pose Estimation Using Thermal Images
    Chen, I-Chien
    Wang, Chang-Jen
    Wen, Chao-Kai
    Tzou, Shiow-Jyu
    IEEE ACCESS, 2020, 8 : 174964 - 174971
  • [18] Improving Multi-Person Pose Tracking With a Confidence Network
    Fu, Zehua
    Zuo, Wenhang
    Hu, Zhenghui
    Liu, Qingjie
    Wang, Yunhong
    IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 5223 - 5233
  • [19] Overcoming Data Deficiency for Multi-Person Pose Estimation
    Dai, Yan
    Wang, Xuanhan
    Gao, Lianli
    Song, Jingkuan
    Zheng, Feng
    Shen, Heng Tao
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024, 35 (08) : 10857 - 10868
  • [20] Detecting events and key actors in multi-person videos
    Ramanathan, Vignesh
    Huang, Jonathan
    Abu-El-Haija, Sami
    Gorban, Alexander
    Murphy, Kevin
    Li Fei-Fei
    2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, : 3043 - 3053