YOLACTFusion: An instance segmentation method for RGB-NIR multimodal image fusion based on an attention mechanism

被引:14
|
作者
Liu, Cheng [1 ,2 ]
Feng, Qingchun [1 ,3 ]
Sun, Yuhuan [1 ,3 ]
Li, Yajun [1 ,3 ]
Ru, Mengfei [1 ,3 ]
Xu, Lijia [2 ]
机构
[1] Beijing Acad Agr & Forestry Sci, Intelligent Equipment Res Ctr, Beijing 100097, Peoples R China
[2] Sichuan Agr Univ, Coll Mech & Elect Engn, Yaan 625014, Peoples R China
[3] Beijing Key Lab Intelligent Equipment Technol Agr, Beijing 100097, Peoples R China
关键词
Multimodal fusion; Attention mechanism; YOLACT; Tomato main-stem; Multimodal loss function; CLASSIFICATION;
D O I
10.1016/j.compag.2023.108186
中图分类号
S [农业科学];
学科分类号
09 ;
摘要
The tomato plant's main-stem is a feasible lead for robotic searching the grows discretely-growing targets of harvesting, pruning or pollinating. Owing to the highlighted reflection characteristics of the main-stem in the near-infrared (NIR) waveband, this study proposes a multimodal hierarchical fusion method (YOLACTFusion) based on the attention mechanism, to achieve an instance segmentation of the main-stem from similar-colored differentiation (i.e., green leaf and green fruit) in robotic vision systems. The model inputs RGB images and 900-1100 nm NIR images into two ResNet50 backbone networks and uses a parallel attention mechanism to fuse feature maps of various scales together into the head network, to improve the segmentation performance of the main-stem of RGB images. The loss function for the multimodal image weights the original loss on the RGB image and the position offset loss and classification loss on the NIR image. Furthermore, the local depthwise separable convolution is used for the backbone network, and Conv-BN layers are merged to reduce the computational complexity. The results show that the precision and recall of YOLACTFusion of the main-stem detection, respectively reached 93.90 % and 62.60 %; and the precision and recall of instance segmentation reached 95.12 % and 63.41 %, respectively. Compared to YOLACT, the mean average precision (mAP) of YOLACTFusion is increased from 39.20 % to 46.29 %, the model size is reduced from 199.03 MB to 165.52 MB, while the image processing efficiency remains similar. The overall results show that the multimodal instance segmentation method proposed in this study significantly improves the detection and segmentation of tomato main-stems under a similar-colored background, which would be a potential method for improving agricultural robot's visual perception.
引用
收藏
页数:14
相关论文
共 50 条
  • [41] Multimodal False News Detection Based on Fusion Attention Mechanism
    Liu, Hualing
    Chen, Shanghui
    Qiao, Liang
    Liu, Yaxin
    Computer Engineering and Applications, 2023, 59 (09) : 95 - 103
  • [42] Image Geolocation Method Based on Attention Mechanism Front Loading and Feature Fusion
    Lu, Huayuan
    Yang, Chunfang
    Qi, Baojun
    Zhu, Ma
    Xu, Jingqian
    WIRELESS COMMUNICATIONS & MOBILE COMPUTING, 2022, 2022
  • [43] Remote Sensing Image Instance Segmentation Based on Attention Balanced Feature Pyramid
    Nie, Xuan
    Wang, Hailin
    Chai, Bosong
    Duan, Mengyang
    INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE, 2023, 37 (01)
  • [44] AF-Net: A Medical Image Segmentation Network Based on Attention Mechanism and Feature Fusion
    Hou, Guimin
    Qin, Jiaohua
    Xiang, Xuyu
    Tan, Yun
    Xiong, Neal N.
    CMC-COMPUTERS MATERIALS & CONTINUA, 2021, 69 (02): : 1877 - 1891
  • [45] Research on Image Semantic Segmentation Based on Hybrid Cascade Feature Fusion and Detailed Attention Mechanism
    Du, Zuoqiang
    Liang, Yuan
    IEEE ACCESS, 2024, 12 : 62365 - 62377
  • [46] AMNet: a new RGB-D instance segmentation network based on attention and multi-modality
    Wang, Mingyang
    Hu, Lihua
    Bai, Yuting
    Yao, Xiaoling
    Hu, Jianhua
    Zhang, Sulan
    VISUAL COMPUTER, 2024, 40 (02): : 1311 - 1325
  • [47] AMNet: a new RGB-D instance segmentation network based on attention and multi-modality
    Mingyang Wang
    Lihua Hu
    Yuting Bai
    Xiaoling Yao
    Jianhua Hu
    Sulan Zhang
    The Visual Computer, 2024, 40 (2) : 1311 - 1325
  • [48] Multimodal Image Fusion Method Based on Multiscale Image Matting
    Maqsood, Sarmad
    Damasevicius, Robertas
    Silka, Jakub
    Wozniak, Marcin
    ARTIFICIAL INTELLIGENCE AND SOFT COMPUTING (ICAISC 2021), PT II, 2021, 12855 : 57 - 68
  • [49] Point cloud instance segmentation based on attention mechanism KNN and ASIS module
    Xiang X.-Y.
    Wang L.
    Zong W.-P.
    Li G.-Y.
    Zhejiang Daxue Xuebao (Gongxue Ban)/Journal of Zhejiang University (Engineering Science), 2023, 57 (05): : 875 - 882
  • [50] CRFNet: A Medical Image Segmentation Method Using the Cross Attention Mechanism and Refined Feature Fusion Strategy
    Ma, Chengyun
    Tian, Shengwei
    Yu, Long
    PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2024, PT II, 2025, 15032 : 247 - 260