Large Language Models (LLMs) Inference Offloading and Resource Allocation in Cloud-Edge Computing: An Active Inference Approach

被引：0

作者：

He Y. ^{[1
]}

Fang J. ^{[1
]}

Yu F.R. ^{[1
]}

Leung V.C. ^{[2
]}

机构：

[1] College of Computer Science, Software Engineering, Shenzhen University

[2] Department of Electrical, Computer Engineering, The University of British Columbia

来源：

IEEE Transactions on Mobile Computing | 2024年 / 23卷 / 12期

关键词：

Active inference; Artificial neural networks; Cloud computing; cloud-edge computing; Computational modeling; Edge computing; large language model; Predictive models; reinforcement learning; resource allocation; Resource management; Task analysis; task offloading;

D O I：

10.1109/TMC.2024.3415661

中图分类号：

学科分类号：

摘要：

With the increasing popularity and demands for large language model applications on mobile devices, it is difficult for resource-limited mobile terminals to run large-model inference tasks efficiently. Traditional deep reinforcement learning (DRL) based approaches have been used to offload large language models (LLMs) inference tasks to servers. However, existing DRL solutions suffer from data inefficiency, insensitivity to latency requirements, and non-adaptability to task load variations, which will degrade the performance of LLMs. In this paper, we propose a novel approach based on active inference for LLMs inference task offloading and resource allocation in cloud-edge computing. Extensive simulation results show that our proposed method has superior performance over mainstream DRLs, improves in data utilization efficiency, and is more adaptable to changing task load scenarios. IEEE

引用

页码：1 / 12

页数：11

共 50 条

[31] Optimizing task offloading and resource allocation in edge-cloud networks: a DRL approach
Ihsan Ullah
Hyun-Kyo Lim
Yeong-Jun Seok
Youn-Hee Han
Journal of Cloud Computing, 12
[32] Optimizing task offloading and resource allocation in edge-cloud networks: a DRL approach
Ullah, Ihsan
Lim, Hyun-Kyo
Seok, Yeong-Jun
Han, Youn-Hee
JOURNAL OF CLOUD COMPUTING-ADVANCES SYSTEMS AND APPLICATIONS, 2023, 12 (01):
[33] Profit-Maximized Collaborative Computation Offloading and Resource Allocation in Distributed Cloud and Edge Computing Systems
Yuan, Haitao
Zhou, MengChu
IEEE TRANSACTIONS ON AUTOMATION SCIENCE AND ENGINEERING, 2021, 18 (03) : 1277 - 1287
[34] Efficient Inference Offloading for Mixture-of-Experts Large Language Models in Internet of Medical Things
Yuan, Xiaoming
Kong, Weixuan
Luo, Zhenyu
Xu, Minrui
ELECTRONICS, 2024, 13 (11)
[35] Active inference goes to school: the importance of active learning in the age of large language models
Di Paolo, Laura Desiree
White, Ben
Guenin-Carlut, Avel
Constant, Axel
Clark, Andy
PHILOSOPHICAL TRANSACTIONS OF THE ROYAL SOCIETY B-BIOLOGICAL SCIENCES, 2024, 379 (1911)
[36] A Bilevel Optimization Approach for Joint Offloading Decision and Resource Allocation in Cooperative Mobile Edge Computing
Huang, Pei-Qiu
Wang, Yong
Wang, Kezhi
Liu, Zhi-Zhong
IEEE TRANSACTIONS ON CYBERNETICS, 2020, 50 (10) : 4228 - 4241
[37] Decentralized Computation Offloading and Resource Allocation for Mobile-Edge Computing: A Matching Game Approach
Quoc-Viet Pham
Tuan Leanh
Tran, Nguyen H.
Park, Bang Ju
Hong, Choong Seon
IEEE ACCESS, 2018, 6 : 75868 - 75885
[38] Joint Offloading and Resource Allocation in Mobile Edge Computing Systems: An Actor-Critic Approach
Zhang, Zhicai
Yu, F. Richard
Fu, Fang
Yan, Qiao
Wang, Zhouyang
2018 IEEE GLOBAL COMMUNICATIONS CONFERENCE (GLOBECOM), 2018,
[39] Joint Task Offloading and Resource Allocation for Quality-Aware Edge-Assisted Machine Learning Task Inference
Fan, Wenhao
Chen, Zeyu
Hao, Zhibo
Wu, Fan
Liu, Yuan'an
IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, 2023, 72 (05) : 6739 - 6752
[40] Time-Slotted Task Offloading and Resource Allocation for Cloud-Edge-End Cooperative Computing Networks
Fan, Wenhao
Liu, Xun
Yuan, Hao
Li, Nan
Liu, Yuan'an
IEEE TRANSACTIONS ON MOBILE COMPUTING, 2024, 23 (08) : 8225 - 8241

← 1 2 3 4 5 →