Large Language Models (LLMs) Inference Offloading and Resource Allocation in Cloud-Edge Computing: An Active Inference Approach

被引:0
|
作者
He Y. [1 ]
Fang J. [1 ]
Yu F.R. [1 ]
Leung V.C. [2 ]
机构
[1] College of Computer Science, Software Engineering, Shenzhen University
[2] Department of Electrical, Computer Engineering, The University of British Columbia
关键词
Active inference; Artificial neural networks; Cloud computing; cloud-edge computing; Computational modeling; Edge computing; large language model; Predictive models; reinforcement learning; resource allocation; Resource management; Task analysis; task offloading;
D O I
10.1109/TMC.2024.3415661
中图分类号
学科分类号
摘要
With the increasing popularity and demands for large language model applications on mobile devices, it is difficult for resource-limited mobile terminals to run large-model inference tasks efficiently. Traditional deep reinforcement learning (DRL) based approaches have been used to offload large language models (LLMs) inference tasks to servers. However, existing DRL solutions suffer from data inefficiency, insensitivity to latency requirements, and non-adaptability to task load variations, which will degrade the performance of LLMs. In this paper, we propose a novel approach based on active inference for LLMs inference task offloading and resource allocation in cloud-edge computing. Extensive simulation results show that our proposed method has superior performance over mainstream DRLs, improves in data utilization efficiency, and is more adaptable to changing task load scenarios. IEEE
引用
收藏
页码:1 / 12
页数:11
相关论文
共 50 条
  • [1] Large Language Models (LLMs) Inference Offloading and Resource Allocation in Cloud-Edge Networks: An Active Inference Approach
    Fang, Jingcheng
    He, Ying
    Yu, F. Richard
    Li, Jianqiang
    Leung, Victor C.
    2023 IEEE 98TH VEHICULAR TECHNOLOGY CONFERENCE, VTC2023-FALL, 2023,
  • [2] Incentive-driven Computation Offloading and Resource Allocation in Mobile Cloud-Edge Computing
    Li, Mingze
    Wu, Tong
    Zhou, Huan
    Zhao, Liang
    Leung, Victor C. M.
    2022 IEEE 42ND INTERNATIONAL CONFERENCE ON DISTRIBUTED COMPUTING SYSTEMS WORKSHOPS (ICDCSW), 2022, : 157 - 162
  • [3] Reverse Auction-Based Computation Offloading and Resource Allocation in Mobile Cloud-Edge Computing
    Zhou, Huan
    Wu, Tong
    Chen, Xin
    He, Shibo
    Guo, Deke
    Wu, Jie
    IEEE TRANSACTIONS ON MOBILE COMPUTING, 2023, 22 (10) : 6144 - 6159
  • [4] Multiuser Computation Offloading and Resource Allocation for Cloud-Edge Heterogeneous Network
    Chen, Qinglin
    Kuang, Zhufang
    Zhao, Lian
    IEEE INTERNET OF THINGS JOURNAL, 2022, 9 (05) : 3799 - 3811
  • [5] Adaptive Data Sharing and Computation Offloading in Cloud-Edge Computing with Resource Constraints
    Chu, Wenjie
    Zhao, Haiyan
    Jin, Zhi
    Hu, Zhenjiang
    2020 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS (SMC), 2020, : 2842 - 2849
  • [6] Energy-Efficient Cloud-Edge Collaborative Computing: Joint Task Offloading, Resource Allocation, and Service Caching
    Liang, Yong
    Sun, Haifeng
    Deng, Yunfeng
    ADVANCED INTELLIGENT COMPUTING TECHNOLOGY AND APPLICATIONS, PT V, ICIC 2024, 2024, 14879 : 285 - 296
  • [7] Towards Blockchain-Based Resource Allocation Models for Cloud-Edge Computing in IoT Applications
    Liu, Xing
    WIRELESS PERSONAL COMMUNICATIONS, 2024, 135 (04) : 2483 - 2483
  • [8] Beyond the Cloud: Edge Inference for Generative Large Language Models in Wireless Networks
    Zhang, Xinyuan
    Nie, Jiangtian
    Huang, Yudong
    Xie, Gaochang
    Xiong, Zehui
    Liu, Jiang
    Niyato, Dusit
    Shen, Xuemin
    IEEE Transactions on Wireless Communications, 2025, 24 (01) : 643 - 658
  • [9] Task Offloading and Resource Allocation for Edge-Cloud Collaborative Computing
    Wang, Yaxing
    Hao, Jia
    Xu, Gang
    Huang, Baoqi
    Zhang, Feng
    ALGORITHMS AND ARCHITECTURES FOR PARALLEL PROCESSING, ICA3PP 2023, PT V, 2024, 14491 : 361 - 372
  • [10] Dynamic Resource Allocation for Cloud-Edge Collaboration Offloading in VEC Networks With Diverse Tasks
    Geng, Jingwei
    Qin, Zaiming
    Jin, Shunfu
    IEEE Transactions on Intelligent Transportation Systems, 2024, 25 (12) : 21235 - 21251