MVImgNet: A Large-scale Dataset of Multi-view Images

被引:7
|
作者
Yu, Xianggang [1 ,2 ]
Xu, Mutian [1 ,2 ]
Zhang, Yidan [1 ,2 ]
Liu, Haolin [1 ,2 ]
Ye, Chongjie [1 ,2 ]
Wu, Yushuang [1 ,2 ]
Yan, Zizheng [1 ,2 ]
Zhu, Chenming [1 ,2 ]
Xiong, Zhangyang [1 ,2 ]
Liang, Tianyou [1 ,2 ]
Chen, Guanying [1 ,2 ]
Cui, Shuguang [1 ,2 ]
Han, Xiaoguang [1 ,2 ]
机构
[1] CUHKSZ, FNii, Shenzhen, Peoples R China
[2] CUHKSZ, SSE, Shenzhen, Peoples R China
基金
国家重点研发计划;
关键词
D O I
10.1109/CVPR52729.2023.00883
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Being data-driven is one of the most iconic properties of deep learning algorithms. The birth of ImageNet [24] drives a remarkable trend of 'learning from large-scale data' in computer vision. Pretraining on ImageNet to obtain rich universal representations has been manifested to benefit various 2D visual tasks, and becomes a standard in 2D vision. However, due to the laborious collection of real-world 3D data, there is yet no generic dataset serving as a counterpart of ImageNet in 3D vision, thus how such a dataset can impact the 3D community is unraveled. To remedy this defect, we introduce MVImgNet, a large-scale dataset of multi-view images, which is highly convenient to gain by shooting videos of real-world objects in human daily life. It contains 6.5 million frames from 219,188 videos crossing objects from 238 classes, with rich annotations of object masks, camera parameters, and point clouds. The multi-view attribute endows our dataset with 3D-aware signals, making it a soft bridge between 2D and 3D vision. We conduct pilot studies for probing the potential of MVImgNet on a variety of 3D and 2D visual tasks, including radiance field reconstruction, multi-view stereo, and view-consistent image understanding, where MVImgNet demonstrates promising performance, remaining lots of possibilities for future explorations. Besides, via dense reconstruction on MVImgNet, a 3D object point cloud dataset is derived, called MVPNet, covering 87,200 samples from 150 categories, with the class label on each point cloud. Experiments show that MVPNet can benefit the real-world 3D object classification while posing new challenges to point cloud understanding. MVImgNet and MVPNet will be public, hoping to inspire the broader vision community.
引用
下载
收藏
页码:9150 / 9161
页数:12
相关论文
共 50 条
  • [1] MVImgNet2.0: A Larger-scale Dataset of Multi-view Images
    Wu, Yushuang
    Shi, Luyue
    Liu, Haolin
    Liao, Hongjie
    Qiu, Lingteng
    Yuan, Weihao
    Gu, Xiaodong
    Dong, Zilong
    Cui, Shuguang
    Han, Xiaoguang
    ACM Transactions on Graphics, 2024, 43 (06):
  • [2] BlendedMVS: A Large-scale Dataset for Generalized Multi-view Stereo Networks
    Yao, Yao
    Luo, Zixin
    Li, Shiwei
    Zhang, Jingyang
    Ren, Yufan
    Zhou, Lei
    Fang, Tian
    Quan, Long
    2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, : 1787 - 1796
  • [3] MVHumanNet: A Large-scale Dataset of Multi-view Daily Dressing Human Captures
    Xiong, Zhangyang
    Li, Chenghong
    Liu, Kenkun
    Liao, Hongjie
    Hu, Jianqiao
    Zhu, Junyi
    Ning, Shuliang
    Qiu, Lingteng
    Wang, Chongjie
    Wang, Shijie
    Cui, Shuguang
    Han, Xiaoguang
    arXiv, 2023,
  • [4] A Large-Scale Hierarchical Multi-View RGB-D Object Dataset
    Lai, Kevin
    Bo, Liefeng
    Ren, Xiaofeng
    Fox, Dieter
    2011 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA), 2011, : 1817 - 1824
  • [5] Multi-sensor large-scale dataset for multi-view 3D reconstruction
    Voynov, Oleg
    Bobrovskikh, Gleb
    Karpyshev, Pavel
    Galochkin, Saveliy
    Ardelean, Andrei-Timotei
    Bozhenko, Arseniy
    Karmanova, Ekaterina
    Kopanev, Pavel
    Labutin-Rymsho, Yaroslav
    Rakhimov, Ruslan
    Safin, Aleksandr
    Serpiva, Valerii
    Artemov, Alexey
    Burnaev, Evgeny
    Tsetserukou, Dzmitry
    Zorin, Denis
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 21392 - 21403
  • [6] Assembly101: A Large-Scale Multi-View Video Dataset for Understanding Procedural Activities
    Sener, Fadime
    Chatterjee, Dibyadip
    Shelepov, Daniel
    He, Kun
    Singhania, Dipika
    Wang, Robert
    Yao, Angela
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 21064 - 21074
  • [7] A LARGE SCALE MULTI-VIEW RGBD VISUAL AFFORDANCE LEARNING DATASET
    Khalifa, Zeyad
    Shah, Syed Afaq Ali
    2023 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2023, : 1325 - 1329
  • [8] Automatic tie-points extraction for triangulation of large-scale oblique multi-view images
    Yan L.
    Fei L.
    Ye Z.
    Xia W.
    Cehui Xuebao/Acta Geodaetica et Cartographica Sinica, 2016, 45 (03): : 310 - 317and338
  • [9] Large-Scale Multi-View Subspace Clustering in Linear Time
    Kang, Zhao
    Zhou, Wangtao
    Zhao, Zhitong
    Shao, Junming
    Han, Meng
    Xu, Zenglin
    THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 4412 - 4419
  • [10] Benchmarking Large-Scale Multi-View 3D Reconstruction Using Realistic Synthetic Images
    Liu, Zhuohao
    Xu, Zixuan
    Diao, Changyu
    Xing, Wei
    Lu, Dongming
    ELEVENTH INTERNATIONAL CONFERENCE ON GRAPHICS AND IMAGE PROCESSING (ICGIP 2019), 2020, 11373