Large Language Models (LLMs) Inference Offloading and Resource Allocation in Cloud-Edge Computing: An Active Inference Approach

被引:0
|
作者
He Y. [1 ]
Fang J. [1 ]
Yu F.R. [1 ]
Leung V.C. [2 ]
机构
[1] College of Computer Science, Software Engineering, Shenzhen University
[2] Department of Electrical, Computer Engineering, The University of British Columbia
关键词
Active inference; Artificial neural networks; Cloud computing; cloud-edge computing; Computational modeling; Edge computing; large language model; Predictive models; reinforcement learning; resource allocation; Resource management; Task analysis; task offloading;
D O I
10.1109/TMC.2024.3415661
中图分类号
学科分类号
摘要
With the increasing popularity and demands for large language model applications on mobile devices, it is difficult for resource-limited mobile terminals to run large-model inference tasks efficiently. Traditional deep reinforcement learning (DRL) based approaches have been used to offload large language models (LLMs) inference tasks to servers. However, existing DRL solutions suffer from data inefficiency, insensitivity to latency requirements, and non-adaptability to task load variations, which will degrade the performance of LLMs. In this paper, we propose a novel approach based on active inference for LLMs inference task offloading and resource allocation in cloud-edge computing. Extensive simulation results show that our proposed method has superior performance over mainstream DRLs, improves in data utilization efficiency, and is more adaptable to changing task load scenarios. IEEE
引用
收藏
页码:1 / 12
页数:11
相关论文
共 50 条
  • [41] Toward Mobility-Aware Computation Offloading and Resource Allocation in End-Edge-Cloud Orchestrated Computing
    Dai, Bin
    Niu, Jianwei
    Ren, Tao
    Atiquzzaman, Mohammed
    IEEE INTERNET OF THINGS JOURNAL, 2022, 9 (19) : 19450 - 19462
  • [42] Game Theory-Based Task Offloading and Resource Allocation for Vehicular Networks in Edge-Cloud Computing
    Jiang, Qinting
    Xu, Xiaolong
    He, Qiang
    Zhang, Xuyun
    Dai, Fei
    Qi, Lianyong
    Dou, Wanchun
    2021 IEEE INTERNATIONAL CONFERENCE ON WEB SERVICES, ICWS 2021, 2021, : 341 - 346
  • [43] Multi-resource maximin share fair allocation in the cloud-edge collaborative computing system with bandwidth demand compression
    Guo, Hao
    Deng, Bin
    Li, Weidong
    Cluster Computing, 2025, 28 (02)
  • [44] Learn to Coordinate for Computation Offloading and Resource Allocation in Edge Computing: A Rational-Based Distributed Approach
    Liu, Zhicheng
    Zhao, Yunfeng
    Song, Jinduo
    Qiu, Chao
    Chen, Xu
    Wang, Xiaofei
    IEEE TRANSACTIONS ON NETWORK SCIENCE AND ENGINEERING, 2022, 9 (05): : 3136 - 3151
  • [45] Offloading and Resource Allocation With General Task Graph in Mobile Edge Computing: A Deep Reinforcement Learning Approach
    Yan, Jia
    Bi, Suzhi
    Zhang, Ying-Jun Angela
    IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS, 2020, 19 (08) : 5404 - 5419
  • [46] DRL-Driven Joint Task Offloading and Resource Allocation for Energy-Efficient Content Delivery in Cloud-Edge Cooperation Networks
    Fang, Chao
    Hu, Zhaoming
    Meng, Xiangheng
    Tu, Shanshan
    Wang, Zhuwei
    Zeng, Deze
    Ni, Wei
    Guo, Song
    Han, Zhu
    IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, 2023, 72 (12) : 16195 - 16207
  • [47] Joint Offloading and Resource Allocation for Hybrid Cloud and Edge Computing in SAGINs: A Decision Assisted Hybrid Action Space Deep Reinforcement Learning Approach
    Huang, Chong
    Chen, Gaojie
    Xiao, Pei
    Xiao, Yue
    Han, Zhu
    Chambers, Jonathon A.
    IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS, 2024, 42 (05) : 1029 - 1043
  • [48] Cloud-Edge Collaborative Resource Allocation for Blockchain-Enabled Internet of Things: A Collective Reinforcement Learning Approach
    Li, Meng
    Pei, Pan
    Yu, F. Richard
    Si, Pengbo
    Li, Yu
    Sun, Enchang
    Zhang, Yanhua
    IEEE INTERNET OF THINGS JOURNAL, 2022, 9 (22): : 23115 - 23129
  • [49] An Optimal Transport-Based Federated Reinforcement Learning Approach for Resource Allocation in Cloud-Edge Collaborative IoT
    Gan, Deqiao
    Ge, Xiaohu
    Li, Qiang
    IEEE INTERNET OF THINGS JOURNAL, 2024, 11 (02) : 2407 - 2419
  • [50] Intelligent Resource Allocation for UAV-Based Cognitive NOMA Networks: An Active Inference Approach
    Obite, Felix
    Krayani, Ali
    Alam, Atm S.
    Marcenaro, Lucio
    Nallanathan, Arumugam
    Regazzoni, Carlo
    2023 IEEE FUTURE NETWORKS WORLD FORUM, FNWF, 2024,