Large Language Models (LLMs) Inference Offloading and Resource Allocation in Cloud-Edge Computing: An Active Inference Approach

被引:0
|
作者
He Y. [1 ]
Fang J. [1 ]
Yu F.R. [1 ]
Leung V.C. [2 ]
机构
[1] College of Computer Science, Software Engineering, Shenzhen University
[2] Department of Electrical, Computer Engineering, The University of British Columbia
关键词
Active inference; Artificial neural networks; Cloud computing; cloud-edge computing; Computational modeling; Edge computing; large language model; Predictive models; reinforcement learning; resource allocation; Resource management; Task analysis; task offloading;
D O I
10.1109/TMC.2024.3415661
中图分类号
学科分类号
摘要
With the increasing popularity and demands for large language model applications on mobile devices, it is difficult for resource-limited mobile terminals to run large-model inference tasks efficiently. Traditional deep reinforcement learning (DRL) based approaches have been used to offload large language models (LLMs) inference tasks to servers. However, existing DRL solutions suffer from data inefficiency, insensitivity to latency requirements, and non-adaptability to task load variations, which will degrade the performance of LLMs. In this paper, we propose a novel approach based on active inference for LLMs inference task offloading and resource allocation in cloud-edge computing. Extensive simulation results show that our proposed method has superior performance over mainstream DRLs, improves in data utilization efficiency, and is more adaptable to changing task load scenarios. IEEE
引用
收藏
页码:1 / 12
页数:11
相关论文
共 50 条
  • [21] HTR: A Joint Approach for Task Offloading and Resource Allocation in Mobile Edge Computing
    Wang, Zilong
    Du, Hongwei
    Ye, Qiang
    IEEE INTERNATIONAL CONFERENCE ON COMMUNICATIONS (ICC 2021), 2021,
  • [22] A Cloud-Edge Collaborative Computing Task Scheduling and Resource Allocation Algorithm for Energy Internet Environment
    Song, Xin
    Wang, Yue
    Xie, Zhigang
    Xia, Lin
    KSII TRANSACTIONS ON INTERNET AND INFORMATION SYSTEMS, 2021, 15 (06): : 2282 - 2303
  • [23] Resource Allocation Strategy Using Deep Reinforcement Learning in Cloud-Edge Collaborative Computing Environment
    Cen, Junjie
    Li, Yongbo
    MOBILE INFORMATION SYSTEMS, 2022, 2022
  • [24] A Cloud-edge Collaborative Framework for Computing Tasks Based on Load Forecasting and Resource Adaptive Allocation
    Meng, Yu
    Liu, Xingchuan
    Chen, Jiaxi
    Nie, Yongjie
    2022 9TH INTERNATIONAL FORUM ON ELECTRICAL ENGINEERING AND AUTOMATION, IFEEA, 2022, : 1120 - 1124
  • [25] Edge-IoT Computing and Networking Resource Allocation for Decomposable Deep Learning Inference
    Yang, Ya-Ting
    Wei, Hung-Yu
    IEEE INTERNET OF THINGS JOURNAL, 2023, 10 (06) : 5178 - 5193
  • [26] Joint Computation Offloading and Resource Allocation in Mobile-Edge Cloud Computing: A Two-Layer Game Approach
    He, Zhenli
    Guo, Ying
    Zhai, Xiaolong
    Zhao, Mingxiong
    Zhou, Wei
    Li, Keqin
    IEEE Transactions on Cloud Computing, 2025, 13 (01): : 411 - 428
  • [27] Delay-aware resource allocation for partial computation offloading in mobile edge cloud computing
    Yu, Lingfei
    Xu, Hongliu
    Zeng, Yunhao
    Deng, Jiali
    Pervasive and Mobile Computing, 2024, 105
  • [28] Dynamic Task Offloading and Resource Allocation for Mobile-Edge Computing in Dense Cloud RAN
    Zhang, Qi
    Gui, Lin
    Hou, Fen
    Chen, Jiacheng
    Zhu, Shichao
    Tian, Feng
    IEEE INTERNET OF THINGS JOURNAL, 2020, 7 (04) : 3282 - 3299
  • [29] A Task Offloading and Resource Allocation Optimization Method in End-Edge-Cloud Orchestrated Computing
    Peng, Bo
    Peng, Shi Lin
    Li, Qiang
    Chen, Cheng
    Zhou, Yu Zhu
    Lei, Xiang
    ALGORITHMS AND ARCHITECTURES FOR PARALLEL PROCESSING, ICA3PP 2023, PT VI, 2024, 14492 : 299 - 310
  • [30] A hierarchical optimization approach for industrial task offloading and resource allocation in edge computing systems
    Dong, Jiadong
    Chen, Lin
    Zheng, Chunxiang
    Pan, Kai
    Guo, Qinghu
    Wu, Shunfeng
    Wang, Zhaoxiang
    CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS, 2024, 27 (05): : 5953 - 5979