Learning to multi-vehicle cooperative bin packing problem via sequence-to-sequence policy network with deep reinforcement learning model

被引:3
|
作者
Tian, Ran [1 ]
Kang, Chunming [1 ]
Bi, Jiaming [1 ]
Ma, Zhongyu [1 ]
Liu, Yanxing [1 ]
Yang, Saisai [1 ]
Li, Fangfang [1 ]
机构
[1] Northwest Normal Univ, Dept Coll Comp Sci & Engn, Lanzhou 730070, Peoples R China
基金
中国国家自然科学基金;
关键词
Deep Reinforcement Learning; 3D Bin Packing Policy; Position Sequence; Logistics Packing; SEARCH ALGORITHM; LOCAL SEARCH; SUPPLY CHAIN; OPTIMIZATION;
D O I
10.1016/j.cie.2023.108998
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
In the logistics bin packing scenario with only rear bin doors, the packing sequence of items determines the utilization of vehicle packing space, but there is relatively little research on optimizing the packing sequence of items. Therefore, this article focuses on the bin packing sequence problem in the multi-vehicle cooperative bin packing problem(MVCBPP) and proposes a deep reinforcement learning model based on the sequence-to -sequence policy network with deep reinforcement learning model(S2SDRL). Firstly, the sequence-to-sequence neural networks model is constructed, which determines the packing probability of all items. The items will be packed by combining the bidirectional LSTM model and the attention module to construct the encoder and decoder. Secondly, the bin packing strategy of the items is obtained by the constructed reinforcement learning packing framework. Finally, the Seq2Seq policy network is updated and optimized by the policy gradient method with a baseline to obtain the current optimal packing strategy. In several bin packing scenarios, S2SDRL im-proves the average vehicle space utilization by more than 4.0% compared with the traditional packing algorithm, and the forward computation time of the model is much smaller than that of the traditional heuristic algorithm, so the model also has more realistic application value. Ablation experiments also confirm the effectiveness of the modules in the S2SDRL model for optimization of the packing order. The sensitivity analysis shows the model's some stability when the input data changes.
引用
收藏
页数:13
相关论文
共 50 条
  • [41] Adaptive disassembly sequence planning for VR maintenance training via deep reinforcement learning
    Mao, Haoyang
    Liu, Zhenyu
    Qiu, Chan
    INTERNATIONAL JOURNAL OF ADVANCED MANUFACTURING TECHNOLOGY, 2023, 124 (09): : 3039 - 3048
  • [42] Sequence generation for multi-task scheduling in cloud manufacturing with deep reinforcement learning
    Ping, Yaoyao
    Liu, Yongkui
    Zhang, Lin
    Wang, Lihui
    Xu, Xun
    JOURNAL OF MANUFACTURING SYSTEMS, 2023, 67 : 315 - 337
  • [43] Adaptive disassembly sequence planning for VR maintenance training via deep reinforcement learning
    Haoyang Mao
    Zhenyu Liu
    Chan Qiu
    The International Journal of Advanced Manufacturing Technology, 2023, 124 : 3039 - 3048
  • [44] HSMH: A Hierarchical Sequence Multi-Hop Reasoning Model With Reinforcement Learning
    Wang, Dan
    Li, Bo
    Song, Bin
    Chen, Chen
    Yu, F. Richard
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2024, 36 (04) : 1638 - 1649
  • [45] Stepwise Model Selection for Sequence Prediction via Deep Kernel Learning
    Zhang, Yao
    Jarrett, Daniel
    van der Schaar, Mihaela
    INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 108, 2020, 108 : 2304 - 2313
  • [46] Cooperative Multi-Agent Deep Reinforcement Learning for Dynamic Virtual Network Allocation
    Suzuki, Akito
    Kawahara, Ryoichi
    Harada, Shigeaki
    30TH INTERNATIONAL CONFERENCE ON COMPUTER COMMUNICATIONS AND NETWORKS (ICCCN 2021), 2021,
  • [47] Multi-agent deep reinforcement learning for computation offloading in cooperative edge network
    Pengju Wu
    Yepeng Guan
    Journal of Intelligent Information Systems, 2025, 63 (2) : 567 - 591
  • [48] Learning Explicit Credit Assignment for Cooperative Multi-Agent Reinforcement Learning via Polarization Policy Gradient
    Chen, Wubing
    Li, Wenbin
    Liu, Xiao
    Yang, Shangdong
    Gao, Yang
    THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 10, 2023, : 11542 - 11550
  • [49] Freeway accident detection and classification based on the multi-vehicle trajectory data and deep learning model
    Yang, Da
    Wu, Yuezhu
    Sun, Feng
    Chen, Jing
    Zhai, Donghai
    Fu, Chuanyun
    TRANSPORTATION RESEARCH PART C-EMERGING TECHNOLOGIES, 2021, 130
  • [50] GOPT: Generalizable Online 3D Bin Packing via Transformer-Based Deep Reinforcement Learning
    Xiong, Heng
    Guo, Changrong
    Peng, Jian
    Ding, Kai
    Chen, Wenjie
    Qiu, Xuchong
    Bai, Long
    Xu, Jianfeng
    IEEE Robotics and Automation Letters, 2024, 9 (11) : 10335 - 10342