Visual Sorting Method Based on Multi-Modal Information Fusion

被引:2
|
作者
Han, Song [1 ]
Liu, Xiaoping [1 ]
Wang, Gang [1 ]
机构
[1] Beijing Univ Posts & Telecommun, Sch Modern Post, Beijing 100876, Peoples R China
来源
APPLIED SCIENCES-BASEL | 2022年 / 12卷 / 06期
关键词
multi-modal; self-attention; Swin Transformer; depth estimation; robot sorting;
D O I
10.3390/app12062946
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
Visual sorting of stacked parcels is a key issue in intelligent logistics sorting systems. In order to improve the sorting success rate of express parcels and effectively obtain the sorting order of express parcels, a visual sorting method based on multi-modal information fusion (VS-MF) is proposed in this paper. Firstly, an object detection network based on multi-modal information fusion (OD-MF) is proposed. The global gradient feature is extracted from depth information as a self-attention module. More spatial features are learned by the network, and the detection accuracy is improved significantly. Secondly, a multi-modal segmentation network based on Swin Transformer (MS-ST) is proposed to detect the optimal sorting positions and poses of parcels. More fine-grained information of the sorting parcels and the relationships between them are gained by adding Swin Transformer models. Frequency domain information and depth information are used as supervision signals to obtain the pickable areas and infer the occlusion degrees of parcels. A strategy for the optimal sorting order is also proposed to ensure the stability of the system. Finally, a sorting system with a 6-DOF robot is constructed to complete the sorting task of stacked parcels. The accuracy and stability the system are verified by sorting experiments.
引用
收藏
页数:16
相关论文
共 50 条
  • [21] Multi-Modal Emotion Aware System Based on Fusion of Speech and Brain Information
    Ghoniem, Rania M.
    Algarni, Abeer D.
    Shaalan, Khaled
    [J]. INFORMATION, 2019, 10 (07)
  • [22] Robust Deep Multi-modal Learning Based on Gated Information Fusion Network
    Kim, Jaekyum
    Koh, Junho
    Kim, Yecheol
    Choi, Jaehyung
    Hwang, Youngbae
    Choi, Jun Won
    [J]. COMPUTER VISION - ACCV 2018, PT IV, 2019, 11364 : 90 - 106
  • [23] Stacked Multi-modal Refining and Fusion Network for Visual Entailment
    Yao, Yuan
    Hu, Min
    Wang, Xiaohua
    Liu, Chuqing
    [J]. THIRTEENTH INTERNATIONAL CONFERENCE ON GRAPHICS AND IMAGE PROCESSING (ICGIP 2021), 2022, 12083
  • [24] Automatic Inspection of Railway Carbon Strips Based on Multi-Modal Visual Information
    Di Stefano, Erika
    Avizzano, Carlo Alberto
    Bergamasco, Massimo
    Masini, Paolo
    Menci, Mauro
    Russo, Davide
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON ADVANCED INTELLIGENT MECHATRONICS (AIM), 2017, : 178 - 184
  • [25] Multi-modal Information Extraction and Fusion with Convolutional Neural Networks
    Kumar, Dinesh
    Sharma, Dharmendra
    [J]. 2020 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2020,
  • [26] Multi-modal simultaneous machine translation fusion of image information
    Huang, Yan
    Wanga, Zhanyang
    Zhang, TianYuan
    Xu, Chun
    Lianga, Hui
    [J]. JOURNAL OF ENGINEERING RESEARCH, 2023, 11 (02):
  • [27] Adaptive information fusion network for multi-modal personality recognition
    Bao, Yongtang
    Liu, Xiang
    Qi, Yue
    Liu, Ruijun
    Li, Haojie
    [J]. COMPUTER ANIMATION AND VIRTUAL WORLDS, 2024, 35 (03)
  • [28] Contextual Information Driven Multi-modal Medical Image Fusion
    Luo, Xiao-Qing
    Zhang, Zhan-Cheng
    Zhang, Bao-Cheng
    Wu, Xiao-Jun
    [J]. IETE TECHNICAL REVIEW, 2017, 34 (06) : 598 - 611
  • [29] Test method of laser paint removal based on multi-modal feature fusion
    Huang Hai-peng
    Hao Ben-tian
    Ye De-jun
    Gao Hao
    Li Liang
    [J]. JOURNAL OF CENTRAL SOUTH UNIVERSITY, 2022, 29 (10) : 3385 - 3398
  • [30] Multi-modal Fusion Brain Tumor Detection Method Based on Deep Learning
    Yao Hong-ge
    Shen Xin-xia
    Li Yu
    Yu Jun
    Lei Song-ze
    [J]. ACTA PHOTONICA SINICA, 2019, 48 (07)