Noise-tolerant RGB-D feature fusion network for outdoor fruit detection

被引:24
|
作者
Sun, Qixin [1 ,2 ]
Chai, Xiujuan [1 ,2 ]
Zeng, Zhikang [3 ]
Zhou, Guomin [1 ,2 ]
Sun, Tan [1 ,2 ]
机构
[1] Chinese Acad Agr Sci, Agr Informat Inst, Beijing 100081, Peoples R China
[2] Minist Agr & Rural Affairs, Key Lab Agr Big Data, Beijing 100081, Peoples R China
[3] Guangxi Acad Agr Sci, Agr Sci & Technol Informat Res Inst, Nanning 530007, Peoples R China
基金
中国国家自然科学基金;
关键词
Multi-modal; Feature fusion; Attention mechanism; Object detection; Convolutional neural network; APPLE DETECTION; COLOR; RED;
D O I
10.1016/j.compag.2022.107034
中图分类号
S [农业科学];
学科分类号
09 ;
摘要
In the process of farm automation, fruit detection is the basis and guarantee for yield prediction, automatic picking, and other orchard operations. RGB images can only obtain the two-dimensional information of the scene, which is not sufficient to effectively distinguish fruits that are dense growth and occlusion by branches and leaves. With the development of depth sensors, using RGB-D images with more complementary information can boost the performance of fruit detection. However, due to the nature of sensors and scene configurations, the quality of outdoor depth images is poor, posing a challenge when fusing RGB-D features. Therefore, this paper proposes an end-to-end RGB-D object detection network, termed as noise-tolerant feature fusion network (NTFFN), to utilize the outdoor multi-modal data properly and improve the detection accuracy. Specifically, the NTFFN first uses two structurally identical feature extractors to extract single-modal (color and depth) features, which is the base of the subsequent feature fusion. Then, to avoid introducing too much depth noise and focus the perception on the important part of the features, an attention-based fusion module is designed to adaptively fuse the multi-modal features. Finally, multi-scale features from the color images and the fusion modules are used to predict object position, which not only improves the network's ability to detect multi-scale objects but also further enhances the noise immunity of the network. In addition, this paper constructs an RGB-D citrus fruit dataset, which contributes to comprehensively evaluating the proposed network. Evaluation metrics on the dataset show that the NT-FFN achieves an AP(50) of 95.4% with a real-time speed, which outperforms single-modal methods, common multi-modal fusion strategies, and advanced multi-modal detection methods. The proposed NT-FFN also achieves excellent detection results in other fruit detection tasks, which verifies its generalization ability. This study provides the possibility and foundation for performing multi-modal information fusion in outdoor fruit detection.
引用
收藏
页数:13
相关论文
共 50 条
  • [1] MFFNet: Multimodal feature fusion network for RGB-D transparent object detection
    Zhu, Li
    Li, Tuanjie
    Ning, Yuming
    Zhang, Yan
    INTERNATIONAL JOURNAL OF ADVANCED ROBOTIC SYSTEMS, 2024, 21 (05):
  • [2] Discriminative feature fusion for RGB-D salient object detection
    Chen, Zeyu
    Zhu, Mingyu
    Chen, Shuhan
    Lu, Lu
    Tang, Haonan
    Hu, Xuelong
    Ji, Chunfan
    COMPUTERS & ELECTRICAL ENGINEERING, 2023, 106
  • [3] Adaptive fusion network for RGB-D salient object detection
    Chen, Tianyou
    Xiao, Jin
    Hu, Xiaoguang
    Zhang, Guofeng
    Wang, Shaojie
    NEUROCOMPUTING, 2023, 522 : 152 - 164
  • [4] Bifurcation Fusion Network for RGB-D Salient Object Detection
    Zhao, Zhi-Hua
    Chen, Li
    JOURNAL OF CIRCUITS SYSTEMS AND COMPUTERS, 2022, 31 (12)
  • [5] Bidirectional feature learning network for RGB-D salient object detection
    Niu, Ye
    Zhou, Sanping
    Dong, Yonghao
    Wang, Le
    Wang, Jinjun
    Zheng, Nanning
    PATTERN RECOGNITION, 2024, 150
  • [6] Feature Calibrating and Fusing Network for RGB-D Salient Object Detection
    Zhang, Qiang
    Qin, Qi
    Yang, Yang
    Jiao, Qiang
    Han, Jungong
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (03) : 1493 - 1507
  • [7] Compensated Attention Feature Fusion and Hierarchical Multiplication Decoder Network for RGB-D Salient Object Detection
    Zeng, Zhihong
    Liu, Haijun
    Chen, Fenglei
    Tan, Xiaoheng
    REMOTE SENSING, 2023, 15 (09)
  • [8] RAFNet: RGB-D attention feature fusion network for indoor semantic segmentation
    Yan, Xingchao
    Hou, Sujuan
    Karim, Awudu
    Jia, Weikuan
    DISPLAYS, 2021, 70
  • [9] RGB-D Saliency Detection Based on Multi-Level Feature Fusion
    Shi, Yue
    Yu, Wanjun
    Chen, Ying
    Computer Engineering and Applications, 2023, 59 (07): : 207 - 213
  • [10] An adaptive guidance fusion network for RGB-D salient object detection
    Sun, Haodong
    Wang, Yu
    Ma, Xinpeng
    SIGNAL IMAGE AND VIDEO PROCESSING, 2024, 18 (02) : 1683 - 1693