SADNet: Generating immersive virtual reality avatars by real-time monocular pose estimation

被引:0
|
作者
Jiang, Ling [1 ]
Xiong, Yuan [1 ]
Wang, Qianqian [1 ]
Chen, Tong [1 ]
Wu, Wei [1 ]
Zhou, Zhong [1 ,2 ,3 ]
机构
[1] Beihang Univ, State Key Lab Virtual Real Technol & Syst, Beijing, Peoples R China
[2] Zhongguancun Lab, Beijing, Peoples R China
[3] Beihang Univ, POB 6863,37 Xueyuan Rd, Beijing, Peoples R China
关键词
3D avatar; computer animation; human pose estimation;
D O I
10.1002/cav.2233
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Generating immersive virtual reality avatars is a challenging task in VR/AR applications, which maps physical human body poses to avatars in virtual scenes for an immersive user experience. However, most existing work is time-consuming and limited by datasets, which does not satisfy immersive and real-time requirements of VR systems. In this paper, we aim to generate 3D real-time virtual reality avatars based on a monocular camera to solve these problems. Specifically, we first design a self-attention distillation network (SADNet) for effective human pose estimation, which is guided by a pre-trained teacher. Secondly, we propose a lightweight pose mapping method for human avatars that utilizes the camera model to map 2D poses to 3D avatar keypoints, generating real-time human avatars with pose consistency. Finally, we integrate our framework into a VR system, displaying generated 3D pose-driven avatars on Helmet-Mounted Display devices for an immersive user experience. We evaluate SADNet on two publicly available datasets. Experimental results show that SADNet achieves a state-of-the-art trade-off between speed and accuracy. In addition, we conducted a user experience study on the performance and immersion of virtual reality avatars. Results show that pose-driven 3D human avatars generated by our method are smooth and attractive. Generating immersive virtual reality avatars is a challenging task in VR/AR applications, which maps physical human body poses to avatars in virtual scenes for an immersive user experience. However, most existing work is time-consuming and limited by datasets, which does not satisfy immersive and real-time requirements of VR systems. In this paper, we aim to generate 3D real-time virtual reality avatars based on a monocular camera to solve these problems. Specifically, we first design a self-attention distillation network (SADNet) for effective human pose estimation, which is guided by a pre-trained teacher. Secondly, we propose a lightweight pose mapping method for human avatars that utilizes the camera model to map 2D poses to 3D avatar keypoints, generating real-time human avatars with pose consistency. Finally, we integrate our framework into a VR system, displaying generated 3D pose-driven avatars on Helmet-Mounted Display devices for an immersive user experience. We evaluate SADNet on two publicly available datasets. Experimental results show that SADNet achieves a state-of-the-art trade-off between speed and accuracy. In addition, we conducted a user experience study on the performance and immersion of virtual reality avatars. Results show that pose-driven 3D human avatars generated by our method are smooth and attractive. image
引用
收藏
页数:15
相关论文
共 50 条
  • [41] Teaching ASL Signs using Signing Avatars and Immersive Learning in Virtual Reality
    Quandt, Lorna
    Lamberton, Jason
    Willis, Athena S.
    Wang, Jianye
    Weeks, Kaitlyn
    Kubicek, Emily
    Malzkuhn, Melissa
    22ND INTERNATIONAL ACM SIGACCESS CONFERENCE ON COMPUTERS AND ACCESSIBILITY (ASSETS '20), 2020,
  • [42] An Immersive Telepresence System using a Real-Time Omnidirectional Camera and a Virtual Reality Head-Mounted Display
    Gaemperle, Luis
    Seyid, Kerem
    Popovic, Vladan
    Leblebici, Yusuf
    2014 IEEE INTERNATIONAL SYMPOSIUM ON MULTIMEDIA (ISM), 2014, : 175 - 178
  • [43] Improved Stakeholder Communication and Visualizations: Real-Time Interaction and Cost Estimation within Immersive Virtual Environments
    Balali, Vahid
    Noghabaei, Mojtaba
    Heydarian, Arsalan
    Han, Kevin
    CONSTRUCTION RESEARCH CONGRESS 2018: CONSTRUCTION INFORMATION TECHNOLOGY, 2018, : 522 - 530
  • [44] Towards Real-Time Monocular Depth Estimation For Mobile Systems
    Deldjoo, Yashar
    Di Noia, Tommaso
    Di Sciascio, Eugenio
    Pernisco, Gaetano
    Reno, Vito
    Stella, Ettore
    MULTIMODAL SENSING AND ARTIFICIAL INTELLIGENCE: TECHNOLOGIES AND APPLICATIONS II, 2021, 11785
  • [45] OptiDepthNet: A Real-Time Unsupervised Monocular Depth Estimation Network
    Wei, Feng
    Yin, XingHui
    Shen, Jie
    Wang, HuiBin
    WIRELESS PERSONAL COMMUNICATIONS, 2023, 128 (04) : 2831 - 2846
  • [46] Real-Time Depth Estimation from a Monocular Moving Camera
    Handa, Aniket
    Sharma, Prateek
    CONTEMPORARY COMPUTING, 2012, 306 : 494 - 495
  • [47] Towards real-time unsupervised monocular depth estimation on CPU
    Poggi, Matteo
    Aleotti, Filippo
    Tosi, Fabio
    Mattoccia, Stefano
    2018 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2018, : 5848 - 5854
  • [48] Real-time monocular depth estimation with adaptive receptive fields
    Ji, Zhenyan
    Song, Xiaojun
    Guo, Xiaoxuan
    Wang, Fangshi
    Armendariz-Inigo, Jose Enrique
    JOURNAL OF REAL-TIME IMAGE PROCESSING, 2021, 18 (04) : 1369 - 1381
  • [49] High-fidelity pose estimation for real-time extended reality (XR) visualization for cardiac catheterization
    Mohsen Annabestani
    Sandhya Sriram
    Alexandre Caprio
    Sepehr Janghorbani
    S. Chiu Wong
    Alexandros Sigaras
    Bobak Mosadegh
    Scientific Reports, 14 (1)
  • [50] G2O-Pose: Real-Time Monocular 3D Human Pose Estimation Based on General Graph Optimization
    Sun, Haixun
    Zhang, Yanyan
    Zheng, Yijie
    Luo, Jianxin
    Pan, Zhisong
    SENSORS, 2022, 22 (21)