MVImgNet: A Large-scale Dataset of Multi-view Images

被引：7

作者：

Yu, Xianggang ^{[1
,2
]}

Xu, Mutian ^{[1
,2
]}

Zhang, Yidan ^{[1
,2
]}

Liu, Haolin ^{[1
,2
]}

Ye, Chongjie ^{[1
,2
]}

Wu, Yushuang ^{[1
,2
]}

Yan, Zizheng ^{[1
,2
]}

Zhu, Chenming ^{[1
,2
]}

Xiong, Zhangyang ^{[1
,2
]}

Liang, Tianyou ^{[1
,2
]}

Chen, Guanying ^{[1
,2
]}

Cui, Shuguang ^{[1
,2
]}

Han, Xiaoguang ^{[1
,2
]}

机构：

[1] CUHKSZ, FNii, Shenzhen, Peoples R China

[2] CUHKSZ, SSE, Shenzhen, Peoples R China

来源：

2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR) | 2023年

基金：

国家重点研发计划;

关键词：

D O I：

10.1109/CVPR52729.2023.00883

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Being data-driven is one of the most iconic properties of deep learning algorithms. The birth of ImageNet [24] drives a remarkable trend of 'learning from large-scale data' in computer vision. Pretraining on ImageNet to obtain rich universal representations has been manifested to benefit various 2D visual tasks, and becomes a standard in 2D vision. However, due to the laborious collection of real-world 3D data, there is yet no generic dataset serving as a counterpart of ImageNet in 3D vision, thus how such a dataset can impact the 3D community is unraveled. To remedy this defect, we introduce MVImgNet, a large-scale dataset of multi-view images, which is highly convenient to gain by shooting videos of real-world objects in human daily life. It contains 6.5 million frames from 219,188 videos crossing objects from 238 classes, with rich annotations of object masks, camera parameters, and point clouds. The multi-view attribute endows our dataset with 3D-aware signals, making it a soft bridge between 2D and 3D vision. We conduct pilot studies for probing the potential of MVImgNet on a variety of 3D and 2D visual tasks, including radiance field reconstruction, multi-view stereo, and view-consistent image understanding, where MVImgNet demonstrates promising performance, remaining lots of possibilities for future explorations. Besides, via dense reconstruction on MVImgNet, a 3D object point cloud dataset is derived, called MVPNet, covering 87,200 samples from 150 categories, with the class label on each point cloud. Experiments show that MVPNet can benefit the real-world 3D object classification while posing new challenges to point cloud understanding. MVImgNet and MVPNet will be public, hoping to inspire the broader vision community.

引用

下载

页码：9150 / 9161

页数：12

共 50 条

[41] Multi-view Inverse Rendering for Large-scale Real-world Indoor Scenes
Li, Zhen
Wang, Lingli
Cheng, Mofang
Pan, Cihui
Yang, Jiaqi
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 12499 - 12509
[42] Prediction of chemical reaction yields with large-scale multi-view pre-training
Shi, Runhan
Yu, Gufeng
Huo, Xiaohong
Yang, Yang
JOURNAL OF CHEMINFORMATICS, 2024, 16 (01)
[43] Large-Scale Multi-View Clustering via Fast Essential Subspace Representation Learning
Zheng, Qinghai
IEEE SIGNAL PROCESSING LETTERS, 2022, 29 : 1893 - 1897
[44] Prediction of chemical reaction yields with large-scale multi-view pre-training
Runhan Shi
Gufeng Yu
Xiaohong Huo
Yang Yang
Journal of Cheminformatics, 16
[45] Uncertainty-Guided Depth Fusion from Multi-View Satellite Images to Improve the Accuracy in Large-Scale DSM Generation
Qin, Rongjun
Ling, Xiao
Farella, Elisa Mariarosaria
Remondino, Fabio
REMOTE SENSING, 2022, 14 (06)
[46] Coding of multi-view images
Palfner, T
Müller, E
STEREOSCOPIC DISPLAYS AND VIRTUAL REALITY SYSTEMS XI, 2004, 5291 : 47 - 58
[47] Learning Multi-View Aggregation In the Wild for Large-Scale 3D Semantic Segmentation
Robert, Damien
Vallet, Bruno
Landrieu, Loic
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 5565 - 5574
[48] Distributed Refinement of Large-Scale 3D Mesh for Accurate Multi-View Reconstruction
Luo, Qing
Li, Yao
Qi, Yue
2018 8TH INTERNATIONAL CONFERENCE ON VIRTUAL REALITY AND VISUALIZATION (ICVRV), 2018, : 58 - 61
[49] Multi-view stereo for large-scale scene reconstruction with MRF-based depth inference
Sun, Shang
Xu, Dan
Wu, Hao
Ying, Haocong
Mou, Yurui
COMPUTERS & GRAPHICS-UK, 2022, 106 : 248 - 258
[50] Center consistency guided multi-view embedding anchor learning for large-scale graph clustering
Zhang, Xinyue
Ren, Zhenwen
Yang, Chao
KNOWLEDGE-BASED SYSTEMS, 2023, 260

← 1 2 3 4 5 →