SADNet: Generating immersive virtual reality avatars by real-time monocular pose estimation

被引:0
|
作者
Jiang, Ling [1 ]
Xiong, Yuan [1 ]
Wang, Qianqian [1 ]
Chen, Tong [1 ]
Wu, Wei [1 ]
Zhou, Zhong [1 ,2 ,3 ]
机构
[1] Beihang Univ, State Key Lab Virtual Real Technol & Syst, Beijing, Peoples R China
[2] Zhongguancun Lab, Beijing, Peoples R China
[3] Beihang Univ, POB 6863,37 Xueyuan Rd, Beijing, Peoples R China
关键词
3D avatar; computer animation; human pose estimation;
D O I
10.1002/cav.2233
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Generating immersive virtual reality avatars is a challenging task in VR/AR applications, which maps physical human body poses to avatars in virtual scenes for an immersive user experience. However, most existing work is time-consuming and limited by datasets, which does not satisfy immersive and real-time requirements of VR systems. In this paper, we aim to generate 3D real-time virtual reality avatars based on a monocular camera to solve these problems. Specifically, we first design a self-attention distillation network (SADNet) for effective human pose estimation, which is guided by a pre-trained teacher. Secondly, we propose a lightweight pose mapping method for human avatars that utilizes the camera model to map 2D poses to 3D avatar keypoints, generating real-time human avatars with pose consistency. Finally, we integrate our framework into a VR system, displaying generated 3D pose-driven avatars on Helmet-Mounted Display devices for an immersive user experience. We evaluate SADNet on two publicly available datasets. Experimental results show that SADNet achieves a state-of-the-art trade-off between speed and accuracy. In addition, we conducted a user experience study on the performance and immersion of virtual reality avatars. Results show that pose-driven 3D human avatars generated by our method are smooth and attractive. Generating immersive virtual reality avatars is a challenging task in VR/AR applications, which maps physical human body poses to avatars in virtual scenes for an immersive user experience. However, most existing work is time-consuming and limited by datasets, which does not satisfy immersive and real-time requirements of VR systems. In this paper, we aim to generate 3D real-time virtual reality avatars based on a monocular camera to solve these problems. Specifically, we first design a self-attention distillation network (SADNet) for effective human pose estimation, which is guided by a pre-trained teacher. Secondly, we propose a lightweight pose mapping method for human avatars that utilizes the camera model to map 2D poses to 3D avatar keypoints, generating real-time human avatars with pose consistency. Finally, we integrate our framework into a VR system, displaying generated 3D pose-driven avatars on Helmet-Mounted Display devices for an immersive user experience. We evaluate SADNet on two publicly available datasets. Experimental results show that SADNet achieves a state-of-the-art trade-off between speed and accuracy. In addition, we conducted a user experience study on the performance and immersion of virtual reality avatars. Results show that pose-driven 3D human avatars generated by our method are smooth and attractive. image
引用
收藏
页数:15
相关论文
共 50 条
  • [1] Real-time Retargeting of Deictic Motion to Virtual Avatars for Augmented Reality Telepresence
    Kang, Jiho
    Yang, Dongseok
    Kim, Taehei
    Lee, Yewon
    Lee, Sung-Hee
    2023 IEEE INTERNATIONAL SYMPOSIUM ON MIXED AND AUGMENTED REALITY, ISMAR, 2023, : 885 - 893
  • [2] Real-Time 3D Avatars for Tele-rehabilitation in Virtual Reality
    Kurillo, Gregorij
    Koritnik, Tomaz
    Bajd, Tadej
    Bajcsy, Ruzena
    MEDICINE MEETS VIRTUAL REALITY 18, 2011, 163 : 290 - 296
  • [3] Technologies Integration of Immersive Virtual Reality on smartphones with real-time motion capture
    Braga, Marlon Dantas
    Mota, Guilherme Lucio A.
    Moreira da Costa, Rosa Maria E.
    2016 18TH SYMPOSIUM ON VIRTUAL AND AUGMENTED REALITY (SVR 2016), 2016, : 127 - 134
  • [4] Real-time Depth Estimation for Aerial Panoramas in Virtual Reality
    Xu, Di
    Liu, Xiaojun
    Zhang, Yanning
    2020 IEEE CONFERENCE ON VIRTUAL REALITY AND 3D USER INTERFACES WORKSHOPS (VRW 2020), 2020, : 705 - 706
  • [5] Real-time face pose estimation
    McKenna, SJ
    Gong, S
    REAL-TIME IMAGING, 1998, 4 (05) : 333 - 347
  • [6] Exploring Effective Real-Time Ergonomic Guidance Methods for Immersive Virtual Reality Workspaces
    Ji, Ruihua
    Chang, Zhuang
    Wang, Shuxia
    Billinghurst, Mark
    EXTENDED ABSTRACTS OF THE 2024 CHI CONFERENCE ON HUMAN FACTORS IN COMPUTING SYSTEMS, CHI 2024, 2024,
  • [7] Real-Time Monocular Segmentation and Pose Tracking of Multiple Objects
    Tjaden, Henning
    Schwanecke, Ulrich
    Schoemer, Elmar
    COMPUTER VISION - ECCV 2016, PT IV, 2016, 9908 : 423 - 438
  • [8] Real-Time Dense Monocular SLAM for Augmented Reality
    Luo, Hongcheng
    Xue, Tangli
    Yang, Xin
    PROCEEDINGS OF THE 2017 ACM MULTIMEDIA CONFERENCE (MM'17), 2017, : 1237 - 1238
  • [9] Real-time Monocular Dense Mapping for Augmented Reality
    Xue, Tangli
    Luo, Hongcheng
    Cheng, Danpeng
    Yuan, Zikang
    Yang, Xin
    PROCEEDINGS OF THE 2017 ACM MULTIMEDIA CONFERENCE (MM'17), 2017, : 510 - 518
  • [10] Real-Time Object Pose Estimation with Pose Interpreter Networks
    Wu, Jimmy
    Zhou, Bolei
    RusseLL, Rebecca
    Kee, Vincent
    Wagner, Syler
    Hebert, Mitchell
    Torralba, Antonio
    Johnson, David M. S.
    2018 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2018, : 6798 - 6805