Large Language Models (LLMs) Inference Offloading and Resource Allocation in Cloud-Edge Computing: An Active Inference Approach

被引:0
|
作者
He Y. [1 ]
Fang J. [1 ]
Yu F.R. [1 ]
Leung V.C. [2 ]
机构
[1] College of Computer Science, Software Engineering, Shenzhen University
[2] Department of Electrical, Computer Engineering, The University of British Columbia
关键词
Active inference; Artificial neural networks; Cloud computing; cloud-edge computing; Computational modeling; Edge computing; large language model; Predictive models; reinforcement learning; resource allocation; Resource management; Task analysis; task offloading;
D O I
10.1109/TMC.2024.3415661
中图分类号
学科分类号
摘要
With the increasing popularity and demands for large language model applications on mobile devices, it is difficult for resource-limited mobile terminals to run large-model inference tasks efficiently. Traditional deep reinforcement learning (DRL) based approaches have been used to offload large language models (LLMs) inference tasks to servers. However, existing DRL solutions suffer from data inefficiency, insensitivity to latency requirements, and non-adaptability to task load variations, which will degrade the performance of LLMs. In this paper, we propose a novel approach based on active inference for LLMs inference task offloading and resource allocation in cloud-edge computing. Extensive simulation results show that our proposed method has superior performance over mainstream DRLs, improves in data utilization efficiency, and is more adaptable to changing task load scenarios. IEEE
引用
收藏
页码:1 / 12
页数:11
相关论文
共 50 条
  • [31] Optimizing task offloading and resource allocation in edge-cloud networks: a DRL approach
    Ihsan Ullah
    Hyun-Kyo Lim
    Yeong-Jun Seok
    Youn-Hee Han
    Journal of Cloud Computing, 12
  • [32] Optimizing task offloading and resource allocation in edge-cloud networks: a DRL approach
    Ullah, Ihsan
    Lim, Hyun-Kyo
    Seok, Yeong-Jun
    Han, Youn-Hee
    JOURNAL OF CLOUD COMPUTING-ADVANCES SYSTEMS AND APPLICATIONS, 2023, 12 (01):
  • [33] Profit-Maximized Collaborative Computation Offloading and Resource Allocation in Distributed Cloud and Edge Computing Systems
    Yuan, Haitao
    Zhou, MengChu
    IEEE TRANSACTIONS ON AUTOMATION SCIENCE AND ENGINEERING, 2021, 18 (03) : 1277 - 1287
  • [34] Efficient Inference Offloading for Mixture-of-Experts Large Language Models in Internet of Medical Things
    Yuan, Xiaoming
    Kong, Weixuan
    Luo, Zhenyu
    Xu, Minrui
    ELECTRONICS, 2024, 13 (11)
  • [35] Active inference goes to school: the importance of active learning in the age of large language models
    Di Paolo, Laura Desiree
    White, Ben
    Guenin-Carlut, Avel
    Constant, Axel
    Clark, Andy
    PHILOSOPHICAL TRANSACTIONS OF THE ROYAL SOCIETY B-BIOLOGICAL SCIENCES, 2024, 379 (1911)
  • [36] A Bilevel Optimization Approach for Joint Offloading Decision and Resource Allocation in Cooperative Mobile Edge Computing
    Huang, Pei-Qiu
    Wang, Yong
    Wang, Kezhi
    Liu, Zhi-Zhong
    IEEE TRANSACTIONS ON CYBERNETICS, 2020, 50 (10) : 4228 - 4241
  • [37] Decentralized Computation Offloading and Resource Allocation for Mobile-Edge Computing: A Matching Game Approach
    Quoc-Viet Pham
    Tuan Leanh
    Tran, Nguyen H.
    Park, Bang Ju
    Hong, Choong Seon
    IEEE ACCESS, 2018, 6 : 75868 - 75885
  • [38] Joint Offloading and Resource Allocation in Mobile Edge Computing Systems: An Actor-Critic Approach
    Zhang, Zhicai
    Yu, F. Richard
    Fu, Fang
    Yan, Qiao
    Wang, Zhouyang
    2018 IEEE GLOBAL COMMUNICATIONS CONFERENCE (GLOBECOM), 2018,
  • [39] Joint Task Offloading and Resource Allocation for Quality-Aware Edge-Assisted Machine Learning Task Inference
    Fan, Wenhao
    Chen, Zeyu
    Hao, Zhibo
    Wu, Fan
    Liu, Yuan'an
    IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, 2023, 72 (05) : 6739 - 6752
  • [40] Time-Slotted Task Offloading and Resource Allocation for Cloud-Edge-End Cooperative Computing Networks
    Fan, Wenhao
    Liu, Xun
    Yuan, Hao
    Li, Nan
    Liu, Yuan'an
    IEEE TRANSACTIONS ON MOBILE COMPUTING, 2024, 23 (08) : 8225 - 8241