YOLACTFusion: An instance segmentation method for RGB-NIR multimodal image fusion based on an attention mechanism

被引:14
|
作者
Liu, Cheng [1 ,2 ]
Feng, Qingchun [1 ,3 ]
Sun, Yuhuan [1 ,3 ]
Li, Yajun [1 ,3 ]
Ru, Mengfei [1 ,3 ]
Xu, Lijia [2 ]
机构
[1] Beijing Acad Agr & Forestry Sci, Intelligent Equipment Res Ctr, Beijing 100097, Peoples R China
[2] Sichuan Agr Univ, Coll Mech & Elect Engn, Yaan 625014, Peoples R China
[3] Beijing Key Lab Intelligent Equipment Technol Agr, Beijing 100097, Peoples R China
关键词
Multimodal fusion; Attention mechanism; YOLACT; Tomato main-stem; Multimodal loss function; CLASSIFICATION;
D O I
10.1016/j.compag.2023.108186
中图分类号
S [农业科学];
学科分类号
09 ;
摘要
The tomato plant's main-stem is a feasible lead for robotic searching the grows discretely-growing targets of harvesting, pruning or pollinating. Owing to the highlighted reflection characteristics of the main-stem in the near-infrared (NIR) waveband, this study proposes a multimodal hierarchical fusion method (YOLACTFusion) based on the attention mechanism, to achieve an instance segmentation of the main-stem from similar-colored differentiation (i.e., green leaf and green fruit) in robotic vision systems. The model inputs RGB images and 900-1100 nm NIR images into two ResNet50 backbone networks and uses a parallel attention mechanism to fuse feature maps of various scales together into the head network, to improve the segmentation performance of the main-stem of RGB images. The loss function for the multimodal image weights the original loss on the RGB image and the position offset loss and classification loss on the NIR image. Furthermore, the local depthwise separable convolution is used for the backbone network, and Conv-BN layers are merged to reduce the computational complexity. The results show that the precision and recall of YOLACTFusion of the main-stem detection, respectively reached 93.90 % and 62.60 %; and the precision and recall of instance segmentation reached 95.12 % and 63.41 %, respectively. Compared to YOLACT, the mean average precision (mAP) of YOLACTFusion is increased from 39.20 % to 46.29 %, the model size is reduced from 199.03 MB to 165.52 MB, while the image processing efficiency remains similar. The overall results show that the multimodal instance segmentation method proposed in this study significantly improves the detection and segmentation of tomato main-stems under a similar-colored background, which would be a potential method for improving agricultural robot's visual perception.
引用
收藏
页数:14
相关论文
共 50 条
  • [2] RGB-NIR Color Image Fusion: Metric and Psychophysical Experiments
    Hayes, Alex E.
    Finlayson, Graham D.
    Montagna, Roberto
    IMAGE QUALITY AND SYSTEM PERFORMANCE XII, 2015, 9396
  • [3] Enhancement of dark areas on the surface of scrap metals based on RGB-NIR image fusion
    Ma, Tingtian
    Ye, Wenhua
    Li, Xinying
    He, Huanmin
    JOURNAL OF ELECTRONIC IMAGING, 2024, 33 (02)
  • [4] Convolutional Simultaneous Sparse Approximation with Applications to RGB-NIR Image Fusion
    Veshki, Farshad G.
    Vorobyov, Sergiy A.
    2022 56TH ASILOMAR CONFERENCE ON SIGNALS, SYSTEMS, AND COMPUTERS, 2022, : 872 - 876
  • [5] Unsupervised calibration of RGB-NIR capture pairs utilizing dense multimodal image correspondences
    Gama, Filipe
    Georgiev, Mihail
    Gotchev, Atanas
    2018 26TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2018, : 2145 - 2149
  • [6] An Instance Segmentation Method for Insulator Defects Based on an Attention Mechanism and Feature Fusion Network
    Wu, Junpeng
    Deng, Qitong
    Xian, Ran
    Tao, Xinguang
    Zhou, Zhi
    APPLIED SCIENCES-BASEL, 2024, 14 (09):
  • [7] Multimodal image semantic segmentation based on attention mechanism
    Zhang Ji-you
    Zhang Rong-fen
    Liu Yu-hong
    Yuan Wen-hao
    CHINESE JOURNAL OF LIQUID CRYSTALS AND DISPLAYS, 2023, 38 (07) : 975 - 984
  • [8] Enhanced machine perception by a scalable fusion of RGB-NIR image pairs in diverse exposure environments
    Kumar, Wahengbam Kanan
    Singh, Ningthoujam Johny
    Singh, Aheibam Dinamani
    Nongmeikapam, Kishorjit
    MACHINE VISION AND APPLICATIONS, 2021, 32 (04)
  • [9] Visualization Analysis of Crop Spectral Index Based on RGB-NIR Image Matching
    Sun Hong
    Xing Zi-zheng
    Zhang Zhi-yong
    Ma Xu-ying
    Long Yao-wei
    Liu Ning
    Li Min-zan
    SPECTROSCOPY AND SPECTRAL ANALYSIS, 2019, 39 (11) : 3493 - 3500
  • [10] Multi-style transfer and fusion of image's regions based on attention mechanism and instance segmentation
    Ye, Wujian
    Liu, Chaojie
    Chen, Yuehai
    Liu, Yijun
    Liu, Chenming
    Zhou, Huihui
    SIGNAL PROCESSING-IMAGE COMMUNICATION, 2023, 110