Improving real-time apple fruit detection: Multi-modal data and depth fusion with non-targeted background removal

被引:2
|
作者
Kaukab, Shaghaf [1 ]
Komal
Ghodki, Bhupendra M. [2 ]
Ray, Hena [3 ]
Kalnar, Yogesh B. [1 ]
Narsaiah, Kairam [4 ]
Brar, Jaskaran S. [1 ]
机构
[1] ICAR Res Complex, Cent Inst Postharvest Engn & Technol, Ludhiana 141004, India
[2] Indian Inst Technol Kharagpur, Agr & Food Engn Dept, Kharagpur 721302, India
[3] Ctr Dev Adv Comp, Kolkata 700091, India
[4] Indian Council Agr Res, Div Agr Engn, New Delhi 110012, India
关键词
Apple; Fruit detection; 3D localization; YOLO network; RGB-D images; Depth sensor; FASTER R-CNN; RGB; LOCALIZATION; RED;
D O I
10.1016/j.ecoinf.2024.102691
中图分类号
Q14 [生态学(生物生态学)];
学科分类号
071012 ; 0713 ;
摘要
In automated fruit detection, RGB-Depth (RGB-D) images aid the detection model with additional depth information to enhance detection accuracy. However, outdoor depth images are usually of low quality, which limits the quality of depth data. In this study, an approach/technique for real -time apple fruit detection in a highdensity orchard environment by using multi -modal data is presented. Non-targeted background removal using the depth fusion (NBR-DF) method was developed to reduce the high noise condition of depth images. The noise occurred due to the uncontrolled lighting condition and holes with incomplete depth information in the depth images. NBR-DF technique follows three primary steps: pre-processing of depth images (point cloud generation), target object extraction, and background removal. The NBR-DF method serves as a pipeline to pre-process multimodal data to enhance features of depth images by filling holes to eliminate noise generated by depth holes. Further, the NBR-DF implemented with the YOLOv5 enhances the detection accuracy in dense orchard conditions by using multi -modal information as input. An attention-based depth fusion module that adaptively fuses the multi -modal features was developed. The integration of the depth-attention matrix involved pooling operations and sigmoid normalization, both of which are efficient methods for summarizing and normalizing depth information. The fusion module improves the identification of multiscale objects and strengthens the network's resistance to noise. The network then detects the fruit position using multiscale information from the RGB-D images in highly complex orchard environments. The detection results were compared and validated with other methods using different input modals and fusion strategies. The results showed that the detection accuracy using the NBR-DF approach achieved an average precision rate of 0.964 in real time. The performance comparison with other state -of -the -art methods and the model generalization study also establish that the present advanced depth-fusion attention mechanism and effective preprocessing steps in NBR-DF-YOLOv5 significantly surpass those in performance. In conclusion, the developed NBR-DF technique showed the potential to improve real -time apple fruit detection using multi -modal information.
引用
收藏
页数:13
相关论文
共 49 条
  • [1] Improving multi-modal data fusion by anomaly detection
    Jakub Simanek
    Vladimir Kubelka
    Michal Reinstein
    Autonomous Robots, 2015, 39 : 139 - 154
  • [2] Improving multi-modal data fusion by anomaly detection
    Simanek, Jakub
    Kubelka, Vladimir
    Reinstein, Michal
    AUTONOMOUS ROBOTS, 2015, 39 (02) : 139 - 154
  • [3] A software framework for real-time multi-modal detection of microsleeps
    Knopp, Simon J.
    Bones, Philip J.
    Weddell, Stephen J.
    Jones, Richard D.
    AUSTRALASIAN PHYSICAL & ENGINEERING SCIENCES IN MEDICINE, 2017, 40 (03) : 739 - 749
  • [4] Multi-Modal Attention Guided Real-Time Lane Detection
    Zhang, Xinyu
    Gong, Yan
    Li, Zhiwei
    Liu, Xuan
    Pan, Shuyue
    Li, Jun
    2021 6TH IEEE INTERNATIONAL CONFERENCE ON ADVANCED ROBOTICS AND MECHATRONICS (ICARM 2021), 2021, : 146 - 153
  • [5] A software framework for real-time multi-modal detection of microsleeps
    Simon J. Knopp
    Philip J. Bones
    Stephen J. Weddell
    Richard D. Jones
    Australasian Physical & Engineering Sciences in Medicine, 2017, 40 : 739 - 749
  • [6] Real-Time Multi-Modal Semantic Fusion on Unmanned Aerial Vehicles
    Bultmann, Simon
    Quenzel, Jan
    Behnke, Sven
    10TH EUROPEAN CONFERENCE ON MOBILE ROBOTS (ECMR 2021), 2021,
  • [7] Real-time emotion detection system using speech: Multi-modal fusion of different timescale features
    Kim, Samuel
    Georgiou, Panayiotis G.
    Lee, Sungbok
    Narayanan, Shrikanth
    2007 IEEE NINTH WORKSHOP ON MULTIMEDIA SIGNAL PROCESSING, 2007, : 48 - 51
  • [8] A MULTI-MODAL SPECTROSCOPY INSTRUMENT FOR REAL-TIME EARLY DETECTION OF SKIN CANCER
    Sharma, Manu
    Lim, Liang
    Marple, Eric
    Riggs, William
    Tunnell, James W.
    LASERS IN SURGERY AND MEDICINE, 2013, 45 : 39 - 39
  • [9] Real-Time Control Strategy of Exoskeleton Locomotion Trajectory Based on Multi-modal Fusion
    Tao Zhen
    Lei Yan
    Journal of Bionic Engineering, 2023, 20 : 2670 - 2682
  • [10] Real-Time Control Strategy of Exoskeleton Locomotion Trajectory Based on Multi-modal Fusion
    Zhen, Tao
    Yan, Lei
    JOURNAL OF BIONIC ENGINEERING, 2023, 20 (06) : 2670 - 2682