Learning Disentangled Representation for Multi-View 3D Object Recognition

被引：23

作者：

Huang, Jingjia ^{[1
]}

Yan, Wei ^{[1
]}

Li, Ge ^{[1
]}

Li, Thomas ^{[2
]}

Liu, Shan ^{[3
]}

机构：

[1] Peking Univ, Sch Elect & Comp Engn, Shenzhen Grad Sch, Shenzhen 518055, Peoples R China

[2] Peking Univ, AIIT, Hangzhou 100871, Peoples R China

[3] Tencent Media Lab, Palo Alto, CA 94301 USA

来源：

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY | 2022年 / 32卷 / 02期

关键词：

Three-dimensional displays; Solid modeling; Feature extraction; Task analysis; Computer architecture; Object recognition; Computational modeling; Multi-view 3D object; object recognition; disentangled representation; FEATURES;

D O I：

10.1109/TCSVT.2021.3062190

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

3D object recognition is a hot research topic. Particularly, view-based methods, which represent a 3D object with a collection of its rendered views on the 2D domain, play an important role in this field. Currently, view-based researches tend to aggregate information from multiple views via pooling based strategies to endow the models with the characteristic of view permutation invariance, at the cost of inevitable loss of useful features. In this paper, we introduce a new method that learns a more comprehensive descriptor for a 3D object from its views while successfully keeping its robustness to the variation of view permutation. Our method disentangles the information in the set of multi-view images into a global category-related feature and a set of view-permutation related features. To unbind these two parts, an encode-decoder based disentangling architecture is proposed, which barely bring extra computations compared to the baseline model. Systematic experiments are conducted for this new method to demonstrates the effectiveness and the competitive performance based on ModelNet40, ModelNet10, and ShapeNetCore55 datasets. Codes for our paper will be released soon on "https://github.com/hjjpku/multi_view_sort".

引用

页码：646 / 659

页数：14

共 50 条

[41] Learning disentangled user representation with multi-view information fusion on social networks
Tang, Wenyi
Hui, Bei
Tian, Ling
Luo, Guangchun
He, Zaobo
Cai, Zhipeng
INFORMATION FUSION, 2021, 74 : 77 - 86
[42] A Multi-View Probabilistic Model for 3D Object Classes
Sun, Min
Su, Hao
Savarese, Silvio
Li Fei-Fei
CVPR: 2009 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, VOLS 1-4, 2009, : 1247 - +
[43] A Compact Multi-View Descriptor for 3D Object Retrieval
Daras, Petros
Axenopoulos, Apostolos
CBMI: 2009 INTERNATIONAL WORKSHOP ON CONTENT-BASED MULTIMEDIA INDEXING, 2009, : 115 - 119
[44] Dynamic View Aggregation for Multi-View 3D Shape Recognition
Zhou, Yuan
Sun, Zhongqi
Huo, Shuwei
Kung, Sun-Yuan
IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 9163 - 9174
[45] Viewpoint Equivariance for Multi-View 3D Object Detection
Chen, Dian
Li, Jie
Guizilini, Vitor
Ambrus, Rares
Gaidon, Adrien
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 9213 - 9222
[46] Drcnn: Dynamic routing convolutional neural network for multi-view 3d object recognition
Sun, Kai
Zhang, Jiangshe
Liu, Junmin
Yu, Ruixuan
Song, Zengjie
IEEE Transactions on Image Processing, 2021, 30 : 868 - 877
[47] DRCNN: Dynamic Routing Convolutional Neural Network for Multi-View 3D Object Recognition
Sun, Kai
Zhang, Jiangshe
Liu, Junmin
Yu, Ruixuan
Song, Zengjie
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2021, 30 : 868 - 877
[48] Object-based encoding for multi-view sequences of 3D object
Yi, J
Rhee, K
Kim, S
SIGNAL PROCESSING-IMAGE COMMUNICATION, 2002, 17 (03) : 293 - 304
[49] Multi-view Manhole Detection, Recognition, and 3D Localisation
Timofte, Radu
Van Gool, Luc
2011 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCV WORKSHOPS), 2011,
[50] SparseDet: Towards efficient multi-view 3D object detection via sparse scene representation
Li, Jingzhong
Yang, Lin
Shi, Zhen
Chen, Yuxuan
Jin, Yue
Akiyama, Kanta
Xu, Anze
ADVANCED ENGINEERING INFORMATICS, 2024, 62

← 1 2 3 4 5 →