MULTI-VIEW 3D RECONSTRUCTION FROM VIDEO WITH TRANSFORMER

被引：0

作者：

Zhong, Yijie ^{[1
]}

Sun, Zhengxing ^{[1
]}

Sun, Yunhan ^{[1
]}

Luo, Shoutong ^{[1
]}

Wang, Yi ^{[1
]}

Zhang, Wei ^{[1
]}

机构：

[1] Nanjing Univ, State Key Lab Novel Software Technol, Nanjing, Peoples R China

来源：

2022 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP | 2022年

关键词：

Multi-view 3D reconstruction; Sequential modeling; Transformer-based model;

D O I：

10.1109/ICIP46576.2022.9897753

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Multi-view 3D reconstruction is the base for many other applications in computer vision. Video provides multi-view images and temporal information, which can help us better complete the reconstruction goal. Redundant information handling in video and multi-view feature extraction and fusion become the key issues in the shape prior extraction for reconstruction. In this paper, inspired by the recent great success in Transformer models, we propose a transformer-based 3D reconstruction network. We formulate the multi-view 3D reconstruction into three parts: frame encoder, fusion module, and shape decoder. We apply several special used tokens and perform the fusion progressively in the encoder phase, called patch-level progressive fusion module. These tokens describe which part of the object the frame should focus on and the local structural detail progressively. Then we further design a transformer fusion module to aggregate the structure information. Finally, multi-head attention is utilized to build the transformer-based decoder to reuse the shallow features from encoder. In experiments not only can ours method achieve competitive performance, but it also has low model complexity and computation cost.

引用

页码：1661 / 1665

页数：5

共 50 条

[1] Neural 3D Video Synthesis from Multi-view Video
Li, Tianye
Slavcheva, Mira
Zollhoefer, Michael
Green, Simon
Lassner, Christoph
Kim, Changil
Schmidt, Tanner
Lovegrove, Steven
Goesele, Michael
Newcombe, Richard
Lv, Zhaoyang
[J]. 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 5511 - 5521
[2] Long-Range Grouping Transformer for Multi-View 3D Reconstruction
Yang, Liying
Zhu, Zhenwei
Lin, Xuxin
Nong, Jian
Liang, Yanyan
[J]. 2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 18211 - 18221
[3] 3D Reconstruction for Multi-view Objects
Yu, Jun
Yin, Wenbin
Hu, Zhiyi
Liu, Yabin
[J]. COMPUTERS & ELECTRICAL ENGINEERING, 2023, 106
[4] Multi-view 3D Reconstruction with Transformers
Wang, Dan
Cui, Xinrui
Chen, Xun
Zou, Zhengxia
Shi, Tianyang
Salcudean, Septimiu
Wang, Z. Jane
Ward, Rabab
[J]. 2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 5702 - 5711
[5] Wide-Baseline Multi-View Video Segmentation for 3D Reconstruction
Sarim, Muhammad
Hilton, Adrian
Guillemaut, Jean-Yves
Kim, Hansung
Takai, Takeshi
[J]. PROCEEDINGS OF THE 2010 ACM WORKSHOP ON 3D VIDEO PROCESSING (3DVP'10), 2010, : 13 - 18
[6] Multi-View Transformer for 3D Visual Grounding
Huang, Shijia
Chen, Yilun
Jia, Jiaya
Wang, Liwei
[J]. 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 15503 - 15512
[7] STEREOSCOPIC 3D VIEW SYNTHESIS FROM UNSYNCHRONIZED MULTI-VIEW VIDEO
Klose, Felix
Ruhl, Kai
Lipski, Christian
Linz, Christian
Magnor, Markus
[J]. 19TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO-2011), 2011, : 1904 - 1908
[8] A COMPACT 3D REPRESENTATION FOR MULTI-VIEW VIDEO
Salvador, Jordi
Casas, Josep R.
[J]. INTERNATIONAL CONFERENCE ON 3D IMAGING 2011 (IC3D 2011), 2011,
[9] Multi-view video compression for 3D displays
Zwicker, Matthias
Yea, Sehoon
Vetro, Anthony
Forlines, Clifton
Matusik, Wojciech
Pfister, Hanspeter
[J]. CONFERENCE RECORD OF THE FORTY-FIRST ASILOMAR CONFERENCE ON SIGNALS, SYSTEMS & COMPUTERS, VOLS 1-5, 2007, : 1506 - +
[10] 3D Texture Mapping in Multi-view Reconstruction
Chen, Zhaolin
Zhou, Jun
Chen, Yisong
Wang, Guoping
[J]. ADVANCES IN VISUAL COMPUTING, ISVC 2012, PT I, 2012, 7431 : 359 - 371

← 1 2 3 4 5 →