YOLACTFusion: An instance segmentation method for RGB-NIR multimodal image fusion based on an attention mechanism

被引:14
|
作者
Liu, Cheng [1 ,2 ]
Feng, Qingchun [1 ,3 ]
Sun, Yuhuan [1 ,3 ]
Li, Yajun [1 ,3 ]
Ru, Mengfei [1 ,3 ]
Xu, Lijia [2 ]
机构
[1] Beijing Acad Agr & Forestry Sci, Intelligent Equipment Res Ctr, Beijing 100097, Peoples R China
[2] Sichuan Agr Univ, Coll Mech & Elect Engn, Yaan 625014, Peoples R China
[3] Beijing Key Lab Intelligent Equipment Technol Agr, Beijing 100097, Peoples R China
关键词
Multimodal fusion; Attention mechanism; YOLACT; Tomato main-stem; Multimodal loss function; CLASSIFICATION;
D O I
10.1016/j.compag.2023.108186
中图分类号
S [农业科学];
学科分类号
09 ;
摘要
The tomato plant's main-stem is a feasible lead for robotic searching the grows discretely-growing targets of harvesting, pruning or pollinating. Owing to the highlighted reflection characteristics of the main-stem in the near-infrared (NIR) waveband, this study proposes a multimodal hierarchical fusion method (YOLACTFusion) based on the attention mechanism, to achieve an instance segmentation of the main-stem from similar-colored differentiation (i.e., green leaf and green fruit) in robotic vision systems. The model inputs RGB images and 900-1100 nm NIR images into two ResNet50 backbone networks and uses a parallel attention mechanism to fuse feature maps of various scales together into the head network, to improve the segmentation performance of the main-stem of RGB images. The loss function for the multimodal image weights the original loss on the RGB image and the position offset loss and classification loss on the NIR image. Furthermore, the local depthwise separable convolution is used for the backbone network, and Conv-BN layers are merged to reduce the computational complexity. The results show that the precision and recall of YOLACTFusion of the main-stem detection, respectively reached 93.90 % and 62.60 %; and the precision and recall of instance segmentation reached 95.12 % and 63.41 %, respectively. Compared to YOLACT, the mean average precision (mAP) of YOLACTFusion is increased from 39.20 % to 46.29 %, the model size is reduced from 199.03 MB to 165.52 MB, while the image processing efficiency remains similar. The overall results show that the multimodal instance segmentation method proposed in this study significantly improves the detection and segmentation of tomato main-stems under a similar-colored background, which would be a potential method for improving agricultural robot's visual perception.
引用
收藏
页数:14
相关论文
共 50 条
  • [21] Multimodal Frequeny Spectrum Fusion Schema for RGB-T Image Semantic Segmentation
    Liu, Hengyan
    Zhang, Wenzhang
    Dai, Tianhong
    Yin, Longfei
    Ren, Guangyu
    2024 33RD INTERNATIONAL CONFERENCE ON COMPUTER COMMUNICATIONS AND NETWORKS, ICCCN 2024, 2024,
  • [22] FPNet: Fusion Attention Instance Segmentation Network Based On Pose Estimation
    Pi, Lei
    Wu, Jin
    PROCEEDINGS OF THE 33RD CHINESE CONTROL AND DECISION CONFERENCE (CCDC 2021), 2021, : 2426 - 2431
  • [23] A multimodal fusion emotion recognition method based on multitask learning and attention mechanism
    Xie, Jinbao
    Wang, Jiyu
    Wang, Qingyan
    Yang, Dali
    Gu, Jinming
    Tang, Yongqiang
    Varatnitski, Yury I.
    NEUROCOMPUTING, 2023, 556
  • [24] Clothing image segmentation method based on feature learning and attention mechanism
    Gu M.
    Liu J.
    Li L.
    Cui L.
    Fangzhi Xuebao/Journal of Textile Research, 2022, 43 (11): : 163 - 171
  • [25] Cell Image Segmentation Method Based on Residual Block and Attention Mechanism
    Zhang Wenxiu
    Zhu Zhencai
    Zhang Yonghe
    Wang Xinyu
    Ding Guopeng
    ACTA OPTICA SINICA, 2020, 40 (17)
  • [26] Image segmentation method of RGB image and depth image based on kinect
    Xiao, Z. G.
    Li, N. F.
    Zhang, F.
    Liu, C. C.
    BASIC & CLINICAL PHARMACOLOGY & TOXICOLOGY, 2018, 123 : 91 - 91
  • [27] Underwater color image segmentation method via RGB channel fusion
    Xuan, Li
    Mingjun, Zhang
    OPTICAL ENGINEERING, 2017, 56 (02)
  • [28] Hybrid attention mechanism of feature fusion for medical image segmentation
    Tong, Shanshan
    Zuo, Zhentao
    Liu, Zuxiang
    Sun, Dengdi
    Zhou, Tiangang
    IET IMAGE PROCESSING, 2024, 18 (01) : 77 - 87
  • [29] Infrared and visible image fusion method based on hierarchical attention mechanism
    Li, Qinghua
    Yan, Bao
    Luo, Delin
    JOURNAL OF ELECTRONIC IMAGING, 2024, 33 (02)
  • [30] Image segmentation based on visual attention mechanism
    Zhang, Qiaorong
    Gu, Guochang
    Xiao, Huimin
    Journal of Multimedia, 2009, 4 (06): : 363 - 370