MULTI-VIEW 3D RECONSTRUCTION FROM VIDEO WITH TRANSFORMER

被引:0
|
作者
Zhong, Yijie [1 ]
Sun, Zhengxing [1 ]
Sun, Yunhan [1 ]
Luo, Shoutong [1 ]
Wang, Yi [1 ]
Zhang, Wei [1 ]
机构
[1] Nanjing Univ, State Key Lab Novel Software Technol, Nanjing, Peoples R China
关键词
Multi-view 3D reconstruction; Sequential modeling; Transformer-based model;
D O I
10.1109/ICIP46576.2022.9897753
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Multi-view 3D reconstruction is the base for many other applications in computer vision. Video provides multi-view images and temporal information, which can help us better complete the reconstruction goal. Redundant information handling in video and multi-view feature extraction and fusion become the key issues in the shape prior extraction for reconstruction. In this paper, inspired by the recent great success in Transformer models, we propose a transformer-based 3D reconstruction network. We formulate the multi-view 3D reconstruction into three parts: frame encoder, fusion module, and shape decoder. We apply several special used tokens and perform the fusion progressively in the encoder phase, called patch-level progressive fusion module. These tokens describe which part of the object the frame should focus on and the local structural detail progressively. Then we further design a transformer fusion module to aggregate the structure information. Finally, multi-head attention is utilized to build the transformer-based decoder to reuse the shallow features from encoder. In experiments not only can ours method achieve competitive performance, but it also has low model complexity and computation cost.
引用
收藏
页码:1661 / 1665
页数:5
相关论文
共 50 条
  • [1] Neural 3D Video Synthesis from Multi-view Video
    Li, Tianye
    Slavcheva, Mira
    Zollhoefer, Michael
    Green, Simon
    Lassner, Christoph
    Kim, Changil
    Schmidt, Tanner
    Lovegrove, Steven
    Goesele, Michael
    Newcombe, Richard
    Lv, Zhaoyang
    [J]. 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 5511 - 5521
  • [2] Long-Range Grouping Transformer for Multi-View 3D Reconstruction
    Yang, Liying
    Zhu, Zhenwei
    Lin, Xuxin
    Nong, Jian
    Liang, Yanyan
    [J]. 2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 18211 - 18221
  • [3] 3D Reconstruction for Multi-view Objects
    Yu, Jun
    Yin, Wenbin
    Hu, Zhiyi
    Liu, Yabin
    [J]. COMPUTERS & ELECTRICAL ENGINEERING, 2023, 106
  • [4] Multi-view 3D Reconstruction with Transformers
    Wang, Dan
    Cui, Xinrui
    Chen, Xun
    Zou, Zhengxia
    Shi, Tianyang
    Salcudean, Septimiu
    Wang, Z. Jane
    Ward, Rabab
    [J]. 2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 5702 - 5711
  • [5] Wide-Baseline Multi-View Video Segmentation for 3D Reconstruction
    Sarim, Muhammad
    Hilton, Adrian
    Guillemaut, Jean-Yves
    Kim, Hansung
    Takai, Takeshi
    [J]. PROCEEDINGS OF THE 2010 ACM WORKSHOP ON 3D VIDEO PROCESSING (3DVP'10), 2010, : 13 - 18
  • [6] Multi-View Transformer for 3D Visual Grounding
    Huang, Shijia
    Chen, Yilun
    Jia, Jiaya
    Wang, Liwei
    [J]. 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 15503 - 15512
  • [7] STEREOSCOPIC 3D VIEW SYNTHESIS FROM UNSYNCHRONIZED MULTI-VIEW VIDEO
    Klose, Felix
    Ruhl, Kai
    Lipski, Christian
    Linz, Christian
    Magnor, Markus
    [J]. 19TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO-2011), 2011, : 1904 - 1908
  • [8] A COMPACT 3D REPRESENTATION FOR MULTI-VIEW VIDEO
    Salvador, Jordi
    Casas, Josep R.
    [J]. INTERNATIONAL CONFERENCE ON 3D IMAGING 2011 (IC3D 2011), 2011,
  • [9] Multi-view video compression for 3D displays
    Zwicker, Matthias
    Yea, Sehoon
    Vetro, Anthony
    Forlines, Clifton
    Matusik, Wojciech
    Pfister, Hanspeter
    [J]. CONFERENCE RECORD OF THE FORTY-FIRST ASILOMAR CONFERENCE ON SIGNALS, SYSTEMS & COMPUTERS, VOLS 1-5, 2007, : 1506 - +
  • [10] 3D Texture Mapping in Multi-view Reconstruction
    Chen, Zhaolin
    Zhou, Jun
    Chen, Yisong
    Wang, Guoping
    [J]. ADVANCES IN VISUAL COMPUTING, ISVC 2012, PT I, 2012, 7431 : 359 - 371