Recognition of 3D Object Based on Multi-View Recurrent Neural Networks

被引:0
|
作者
Dong S. [1 ]
Li W.-S. [1 ]
Zhang W.-Q. [1 ]
Zou K. [1 ]
机构
[1] Zhongshan Institute, University of Electronic Science and Technology of China, Zhongshan, 528406, Guangdong
关键词
3D object; Feature extraction; Feature fusion; Image retrieval; Multi-view;
D O I
10.12178/1001-0548.2019017
中图分类号
学科分类号
摘要
Multi-view convolutional neural networks (MVCNN) is more accurate and faster than those methods based on state-of-the-art 3D shape descriptors in 3D object recognition tasks. However, the input of MVCNN are views rendered from cameras at fixed positions, which is not the case of most applications. Furthermore, MVCNN uses max-pooling operation to fuse multi-view features and the information of original features may be lost. To address those two problems, a new recognition method of 3D objects based on multi-view recurrent neural networks (MVRNN) is proposed based on MVCNN with improvements on three aspects. First, a new item which is defined as the measure of discrimination is introduced into the cross-entropy loss function to enhance the discrimination of features from different objects. Second, a recurrent neural networks (RNN) is used to fuse multi-view features from free positions into a compact one, instead of the max-pooling operation in MVCNN. RNN can keep the completeness of information about appearance feature. At last, single view feature from free positon is matched with fused features via a bi-classification network to attain fine-grained recognition of 3D objects. Experiments are conducted on the open dataset ModelNet and the private dataset MV3D separately to validate the performance of MVRNN. The results show that MVRNN can exact multi-view features with higher degree of discrimination, and achieve higher accuracy than MVCNN on both datasets. © 2020, Editorial Board of Journal of the University of Electronic Science and Technology of China. All right reserved.
引用
收藏
页码:269 / 275
页数:6
相关论文
共 19 条
  • [1] Lecun Y., Bengio Y., Hinton G., Deep learning, Nature, 521, 7553, pp. 436-444, (2015)
  • [2] Wan J., Wang D., Hoi S.C.H., Et al., Deep learning for content-based image retrieval: A comprehensive study, The 22nd ACM International Conference on Multimedia, pp. 157-166, (2014)
  • [3] Yao H., Zhang S., Zhang Y., Et al., One-shot fine-grained instance retrieval, The 25th ACM International Conference on Multimedia, pp. 342-350, (2017)
  • [4] Kazhdan M.M., Funkhouser T.A., Rusinkiewicz S., Rotation invariant spherical harmonic representation of 3D shape descriptors, The 2003 Eurographics/ACM SIGGRAPH Symposium on Geometry Processing, pp. 156-164, (2003)
  • [5] Knopp J., Prasad M., Willems G., Et al., Hough transform and 3D SURF for robust three dimensional classification, The 2010 European Conference on Computer Vision, pp. 589-602, (2010)
  • [6] Chaudhuri S., Koltun V., Data-driven suggestions for creativity support in 3D modeling, ACM Transactions on Graphics, 29, 6, (2010)
  • [7] Kokkinos I., Bronstein M.M., Litman R., Et al., Intrinsic shape context descriptors for deformable shapes, The 2012 IEEE Conference on Computer Vision and Pattern Recognition, pp. 159-166, (2012)
  • [8] Wu Z., Song S., Khosla A., Et al., 3D ShapeNets: A deep representation for volumetric shapes, The 2015 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1912-1920, (2015)
  • [9] Qi C.R., Su H., Mo K., Et al., PointNet: Deep learning on point sets for 3D classification and segmentation, The 2017 IEEE Conference on Computer Vision and Pattern Recognition, pp. 77-85, (2017)
  • [10] Qi C.R., Yi L., Su H., Et al., PointNet++: Deep hierarchical feature learning on point sets in a metric space, The 2017 Neural Information Processing Systems Conference, pp. 5099-5108, (2017)