Local 3D Editing via 3D Distillation of CLIP Knowledge

被引:3
|
作者
Hyung, Junha [1 ,2 ]
Hwang, Sungwon [1 ]
Kim, Daejin [3 ]
Lee, Hyunji [1 ]
Choo, Jaegul [1 ]
机构
[1] KAIST AI, Daejeon, South Korea
[2] Kakao Enterprise Corp, Seongnam, South Korea
[3] Scatter Lab, Seoul, South Korea
关键词
D O I
10.1109/CVPR52729.2023.01219
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
3D content manipulation is an important computer vision task with many real-world applications (e.g., product design, cartoon generation, and 3D Avatar editing). Recently proposed 3D GANs can generate diverse photorealistic 3D-aware contents using Neural Radiance fields (NeRF). However, manipulation of NeRF still remains a challenging problem since the visual quality tends to degrade after manipulation and suboptimal control handles such as 2D semantic maps are used for manipulations. While text-guided manipulations have shown potential in 3D editing, such approaches often lack locality. To overcome these problems, we propose Local Editing NeRF (LENeRF), which only requires text inputs for fine-grained and localized manipulation. Specifically, we present three add-on modules of LENeRF, the Latent Residual Mapper, the Attention Field Network, and the Deformation Network, which are jointly used for local manipulations of 3D features by estimating a 3D attention field. The 3D attention field is learned in an unsupervised way, by distilling the zero-shot mask generation capability of CLIP to the 3D space with multi-view guidance. We conduct diverse experiments and thorough evaluations both quantitatively and qualitatively.(1)
引用
收藏
页码:12674 / 12684
页数:11
相关论文
共 50 条
  • [1] Beyond the limitation of monocular 3D detector via knowledge distillation
    Yang, Yiran
    Yin, Dongshuo
    Rong, Xuee
    Sun, Xian
    Diao, Wenhui
    Li, Xinming
    [J]. 2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 9043 - 9052
  • [2] PoseNet3D: Learning Temporally Consistent 3D Human Pose via Knowledge Distillation
    Tripathi, Shashank
    Ranade, Siddhant
    Tyagi, Ambrish
    Agrawal, Amit
    [J]. 2020 INTERNATIONAL CONFERENCE ON 3D VISION (3DV 2020), 2020, : 311 - 321
  • [3] Towards Efficient 3D Object Detection with Knowledge Distillation
    Yang, Jihan
    Shi, Shaoshuai
    Ding, Runyu
    Wang, Zhe
    Qi, Xiaojuan
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35, NEURIPS 2022, 2022,
  • [4] CAD : Photorealistic 3D Generation via Adversarial Distillation
    Wan, Ziyu
    Paschalidou, Despoina
    Huang, Ian
    Liu, Hongyu
    Shen, Bokui
    Xiang, Xiaoyu
    Liao, Jing
    Guibas, Leonidas
    [J]. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2024, : 10194 - 10207
  • [5] Interactive 3D video editing
    Michael Waschbüsch
    Stephan Würmlin
    Markus Gross
    [J]. The Visual Computer, 2006, 22 : 631 - 641
  • [6] Continuity Editing for 3D Animation
    Galvane, Quentin
    Ronfard, Remi
    Lino, Christophe
    Christie, Marc
    [J]. PROCEEDINGS OF THE TWENTY-NINTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2015, : 753 - 761
  • [7] Interactive 3D video editing
    Waschbuesch, Michael
    Wuermlin, Stephan
    Gross, Markus
    [J]. VISUAL COMPUTER, 2006, 22 (9-11): : 631 - 641
  • [8] 3D or not 3D?
    Reidy, Heath
    [J]. PROFESSIONAL ENGINEERING, 2009, 22 (13) : 37 - 38
  • [9] 3D or not 3D?
    Rockley, Ted
    [J]. NEW SCIENTIST, 2013, 219 (2928) : 31 - 31
  • [10] A DASKL Descriptor via Encoding the Information of Keypoints and a 3D Local Surface for 3D Matching
    Wu, Yuanhao
    Wang, Chunyang
    Liu, Xuelian
    Shi, Chunhao
    Li, Xuemei
    [J]. ELECTRONICS, 2022, 11 (15)