Improving real-time apple fruit detection: Multi-modal data and depth fusion with non-targeted background removal

被引:2
|
作者
Kaukab, Shaghaf [1 ]
Komal
Ghodki, Bhupendra M. [2 ]
Ray, Hena [3 ]
Kalnar, Yogesh B. [1 ]
Narsaiah, Kairam [4 ]
Brar, Jaskaran S. [1 ]
机构
[1] ICAR Res Complex, Cent Inst Postharvest Engn & Technol, Ludhiana 141004, India
[2] Indian Inst Technol Kharagpur, Agr & Food Engn Dept, Kharagpur 721302, India
[3] Ctr Dev Adv Comp, Kolkata 700091, India
[4] Indian Council Agr Res, Div Agr Engn, New Delhi 110012, India
关键词
Apple; Fruit detection; 3D localization; YOLO network; RGB-D images; Depth sensor; FASTER R-CNN; RGB; LOCALIZATION; RED;
D O I
10.1016/j.ecoinf.2024.102691
中图分类号
Q14 [生态学(生物生态学)];
学科分类号
071012 ; 0713 ;
摘要
In automated fruit detection, RGB-Depth (RGB-D) images aid the detection model with additional depth information to enhance detection accuracy. However, outdoor depth images are usually of low quality, which limits the quality of depth data. In this study, an approach/technique for real -time apple fruit detection in a highdensity orchard environment by using multi -modal data is presented. Non-targeted background removal using the depth fusion (NBR-DF) method was developed to reduce the high noise condition of depth images. The noise occurred due to the uncontrolled lighting condition and holes with incomplete depth information in the depth images. NBR-DF technique follows three primary steps: pre-processing of depth images (point cloud generation), target object extraction, and background removal. The NBR-DF method serves as a pipeline to pre-process multimodal data to enhance features of depth images by filling holes to eliminate noise generated by depth holes. Further, the NBR-DF implemented with the YOLOv5 enhances the detection accuracy in dense orchard conditions by using multi -modal information as input. An attention-based depth fusion module that adaptively fuses the multi -modal features was developed. The integration of the depth-attention matrix involved pooling operations and sigmoid normalization, both of which are efficient methods for summarizing and normalizing depth information. The fusion module improves the identification of multiscale objects and strengthens the network's resistance to noise. The network then detects the fruit position using multiscale information from the RGB-D images in highly complex orchard environments. The detection results were compared and validated with other methods using different input modals and fusion strategies. The results showed that the detection accuracy using the NBR-DF approach achieved an average precision rate of 0.964 in real time. The performance comparison with other state -of -the -art methods and the model generalization study also establish that the present advanced depth-fusion attention mechanism and effective preprocessing steps in NBR-DF-YOLOv5 significantly surpass those in performance. In conclusion, the developed NBR-DF technique showed the potential to improve real -time apple fruit detection using multi -modal information.
引用
收藏
页数:13
相关论文
共 49 条
  • [21] CAM-Vtrans: real-time sports training utilizing multi-modal robot data
    Hong, LinLin
    Lee, Sangheang
    Song, GuanTing
    FRONTIERS IN NEUROROBOTICS, 2024, 18
  • [22] Real-Time Multi-Modal People Detection and Tracking of Mobile Robots with A RGB-D Sensor
    Huang, Wenchao
    Zhou, Bo
    Qian, Kun
    Fang, Fang
    Ma, Xudong
    2019 IEEE 4TH INTERNATIONAL CONFERENCE ON ADVANCED ROBOTICS AND MECHATRONICS (ICARM 2019), 2019, : 325 - 330
  • [23] Analysis of Real-Time Multi-Modal FP-Scheduled Systems with Non-Preemptible Regions
    Ahmed, Masud
    Hettiarachchi, Pradeep
    Fisher, Nathan
    21ST IEEE REAL-TIME AND EMBEDDED TECHNOLOGY AND APPLICATIONS SYMPOSIUM (RTAS 2015), 2015, : 39 - 50
  • [24] SensorClouds: A Framework for Real-Time Processing of Multi-modal Sensor Data for Human-Robot-Collaboration
    Poeppel, Alexander
    Eymueller, Christian
    Reif, Wolfgang
    2023 9TH INTERNATIONAL CONFERENCE ON AUTOMATION, ROBOTICS AND APPLICATIONS, ICARA, 2023, : 294 - 298
  • [25] Integrating multi-modal data into AFSA-LSTM model for real-time oil production prediction
    Jiang, Wei
    Wang, Xin
    Zhang, Shu
    ENERGY, 2023, 279
  • [26] Deep learning and multi-modal fusion for real-time multi-object tracking: Algorithms, challenges, datasets, and comparative study
    Wang, Xuan
    Sun, Zhaojie
    Chehri, Abdellah
    Jeon, Gwanggil
    Song, Yongchao
    INFORMATION FUSION, 2024, 105
  • [27] Real-Time Runway Detection Using Dual-Modal Fusion of Visible and Infrared Data
    Yang, Lichun
    Wu, Jianghao
    Li, Hongguang
    Liu, Chunlei
    Wei, Shize
    REMOTE SENSING, 2025, 17 (04)
  • [28] Real-time multi-modal semantic fusion on unmanned aerial vehicles with label propagation for cross-domain adaptation
    Bultmann, Simon
    Quenzel, Jan
    Behnke, Sven
    ROBOTICS AND AUTONOMOUS SYSTEMS, 2023, 159
  • [29] A real-time human bone fracture detection and classification from multi-modal images using deep learning technique
    Parvin, Shahnaj
    Rahman, Abdur
    APPLIED INTELLIGENCE, 2024, 54 (19) : 9269 - 9285
  • [30] Real-Time Multi-Modal Active Vision for Object Detection on UAVs Equipped With Limited Field of View LiDAR and Camera
    Shi, Chuanbeibei
    Lai, Ganghua
    Yu, Yushu
    Bellone, Mauro
    Lippiello, Vincezo
    IEEE ROBOTICS AND AUTOMATION LETTERS, 2023, 8 (10) : 6571 - 6578