Visual Sorting Method Based on Multi-Modal Information Fusion

被引:2
|
作者
Han, Song [1 ]
Liu, Xiaoping [1 ]
Wang, Gang [1 ]
机构
[1] Beijing Univ Posts & Telecommun, Sch Modern Post, Beijing 100876, Peoples R China
来源
APPLIED SCIENCES-BASEL | 2022年 / 12卷 / 06期
关键词
multi-modal; self-attention; Swin Transformer; depth estimation; robot sorting;
D O I
10.3390/app12062946
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
Visual sorting of stacked parcels is a key issue in intelligent logistics sorting systems. In order to improve the sorting success rate of express parcels and effectively obtain the sorting order of express parcels, a visual sorting method based on multi-modal information fusion (VS-MF) is proposed in this paper. Firstly, an object detection network based on multi-modal information fusion (OD-MF) is proposed. The global gradient feature is extracted from depth information as a self-attention module. More spatial features are learned by the network, and the detection accuracy is improved significantly. Secondly, a multi-modal segmentation network based on Swin Transformer (MS-ST) is proposed to detect the optimal sorting positions and poses of parcels. More fine-grained information of the sorting parcels and the relationships between them are gained by adding Swin Transformer models. Frequency domain information and depth information are used as supervision signals to obtain the pickable areas and infer the occlusion degrees of parcels. A strategy for the optimal sorting order is also proposed to ensure the stability of the system. Finally, a sorting system with a 6-DOF robot is constructed to complete the sorting task of stacked parcels. The accuracy and stability the system are verified by sorting experiments.
引用
收藏
页数:16
相关论文
共 50 条
  • [41] Human activity recognition based on multi-modal fusion
    Cheng Zhang
    Tianqi Zu
    Yibin Hou
    Jian He
    Shengqi Yang
    Ruihai Dong
    [J]. CCF Transactions on Pervasive Computing and Interaction, 2023, 5 : 321 - 332
  • [42] Multi-modal Fusion Based Automatic Pain Assessment
    Zhi, Ruicong
    Yu, Junwei
    [J]. PROCEEDINGS OF 2019 IEEE 8TH JOINT INTERNATIONAL INFORMATION TECHNOLOGY AND ARTIFICIAL INTELLIGENCE CONFERENCE (ITAIC 2019), 2019, : 1378 - 1382
  • [43] Online video visual relation detection with hierarchical multi-modal fusion
    He, Yuxuan
    Gan, Ming-Gang
    Ma, Qianzhao
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2024, 83 (24) : 65707 - 65727
  • [44] Video Visual Relation Detection via Multi-modal Feature Fusion
    Sun, Xu
    Ren, Tongwei
    Zi, Yuan
    Wu, Gangshan
    [J]. PROCEEDINGS OF THE 27TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA (MM'19), 2019, : 2657 - 2661
  • [45] The multi-modal fusion in visual question answering: a review of attention mechanisms
    Lu, Siyu
    Liu, Mingzhe
    Yin, Lirong
    Yin, Zhengtong
    Liu, Xuan
    Zheng, Wenfeng
    [J]. PEERJ COMPUTER SCIENCE, 2023, 9
  • [46] Learning Visual Emotion Distributions via Multi-Modal Features Fusion
    Zhao, Sicheng
    Ding, Guiguang
    Gao, Yue
    Han, Jungong
    [J]. PROCEEDINGS OF THE 2017 ACM MULTIMEDIA CONFERENCE (MM'17), 2017, : 369 - 377
  • [47] Multi-Modal Fusion Transformer for Visual Question Answering in Remote Sensing
    Siebert, Tim
    Clasen, Kai Norman
    Ravanbakhsh, Mahdyar
    Demir, Beguem
    [J]. IMAGE AND SIGNAL PROCESSING FOR REMOTE SENSING XXVIII, 2022, 12267
  • [48] A Cross-Modal Guiding and Fusion Method for Multi-Modal RSVP-based Image Retrieval
    Mao, Jiayu
    Qiu, Shuang
    Li, Dan
    Wei, Wei
    He, Huiguang
    [J]. 2021 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2021,
  • [49] Image Visual Attention Mechanism-based Global and Local Semantic Information Fusion for Multi-modal English Machine Translation
    Zhengzhou Railway Vocational and Technical College, Zhengzhou
    450000, China
    [J]. J. Comput, 2 (37-50): : 37 - 50
  • [50] Multi-modal visual tracking based on textual generation
    Wang, Jiahao
    Liu, Fang
    Jiao, Licheng
    Wang, Hao
    Li, Shuo
    Li, Lingling
    Chen, Puhua
    Liu, Xu
    [J]. INFORMATION FUSION, 2024, 112