Optimizing Microservice Deployment in Edge Computing with Large Language Models: Integrating Retrieval Augmented Generation and Chain of Thought Techniques

被引:0
|
作者
Feng, Kan [1 ]
Luo, Lijun [1 ]
Xia, Yongjun [2 ]
Luo, Bin [2 ]
He, Xingfeng [1 ]
Li, Kaihong [3 ]
Zha, Zhiyong [4 ]
Xu, Bo [1 ,5 ]
Peng, Kai [1 ]
机构
[1] Huazhong Univ Sci & Technol, Sch Elect Informat & Commun, Hubei Key Lab Smart Internet Technol, Wuhan 430074, Peoples R China
[2] Hubei Huazhong Elect Power Technol Dev Co Ltd, Wuhan 430079, Peoples R China
[3] Wuhan Univ, Elect Informat Sch, Wuhan 430072, Peoples R China
[4] State Grid Informat Telecommun Co Ltd, Wuhan 430048, Peoples R China
[5] Hubei ChuTianYun Co Ltd, Wuhan 430076, Peoples R China
来源
SYMMETRY-BASEL | 2024年 / 16卷 / 11期
关键词
large language models; retrieval augmented generation; microservice deployment; mobile edge computing;
D O I
10.3390/sym16111470
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Large Language Models (LLMs) have demonstrated impressive capabilities in autogenerating code based on natural language instructions provided by humans. We observed that in the microservice models of edge computing, the problem of deployment latency optimization can be transformed into an NP-hard mathematical optimization problem. However, in the real world, deployment strategies at the edge often require immediate updates, while human-engineered code tends to be lagging. To bridge this gap, we innovatively integrated LLMs into the decision-making process for microservice deployment. Initially, we constructed a private Retrieval Augmented Generation (RAG) database containing prior knowledge. Subsequently, we employed meticulously designed step-by-step inductive instructions and used the chain of thought (CoT) technique to enable the LLM to learn, reason, reflect, and regenerate. We decomposed the microservice deployment latency optimization problem into a collection of granular sub-problems (described in natural language), progressively providing instructions to the fine-tuned LLM to generate corresponding code blocks. The generated code blocks underwent integration and consistency assessment. Additionally, we prompted the LLM to generate code without the use of the RAG database for comparative analysis. We executed the aforementioned code and comparison algorithm under identical operational environments and simulation parameters, conducting rigorous result analysis. Our fine-tuned model significantly reduced latencies by 22.8% in handling surges in request flows, 37.8% in managing complex microservice types, and 39.5% in processing increased network nodes compared to traditional algorithms. Moreover, our approach demonstrated marked improvements in latency performance over LLMs not utilizing RAG technology and reinforcement learning algorithms reported in other literature. The use of LLMs also highlights the concept of symmetry, as the symmetrical structure of input-output relationships in microservice deployment models aligns with the LLM's inherent ability to process and generate balanced and optimized code. Symmetry in this context allows for more efficient resource allocation and reduces redundant operations, further enhancing the model's effectiveness. We believe that LLMs hold substantial potential in optimizing microservice deployment models.
引用
收藏
页数:22
相关论文
共 50 条
  • [21] Facilitating university admission using a chatbot based on large language models with retrieval-augmented generation
    Chen, Zheng
    Zou, Di
    Xie, Haoran
    Lou, Huajie
    Pang, Zhiyuan
    EDUCATIONAL TECHNOLOGY & SOCIETY, 2024, 27 (04): : 454 - 470
  • [22] DEVELOPMENT OF A RETRIEVAL-AUGMENTED GENERATION PIPELINE LEVERAGING LARGE LANGUAGE MODELS TO SUPPORT EVIDENCE SYNTHESIS
    Perera, C.
    Heron, L.
    Hirst, A.
    VALUE IN HEALTH, 2024, 27 (12)
  • [23] Advancing Cyber Incident Timeline Analysis Through Retrieval-Augmented Generation and Large Language Models
    Loumachi, Fatma Yasmine
    Ghanem, Mohamed Chahine
    Ferrag, Mohamed Amine
    COMPUTERS, 2025, 14 (02)
  • [24] Quantitative Evaluation of Using Large Language Models and Retrieval-Augmented Generation in Computer Science Education
    Wang, Kevin Shukang
    Lawrence, Ramon
    PROCEEDINGS OF THE 56TH ACM TECHNICAL SYMPOSIUM ON COMPUTER SCIENCE EDUCATION, SIGCSE TS 2025, VOL 2, 2025, : 1183 - 1189
  • [25] Leveraging Retrieval-Augmented Generation for Reliable Medical Question Answering Using Large Language Models
    Kharitonova, Ksenia
    Perez-Fernandez, David
    Gutierrez-Hernando, Javier
    Gutierrez-Fandino, Asier
    Callejas, Zoraida
    Griol, David
    HYBRID ARTIFICIAL INTELLIGENT SYSTEMS, PT II, HAIS 2024, 2025, 14858 : 141 - 153
  • [26] OG-RAG: ONTOLOGY-GROUNDED RETRIEVAL-AUGMENTED GENERATION FOR LARGE LANGUAGE MODELS
    Sharma, Kartik
    Kumar, Peeyush
    Li, Yunqing
    arXiv,
  • [27] CRUD-RAG: A Comprehensive Chinese Benchmark for Retrieval-Augmented Generation of Large Language Models
    Lyu, Yuanjie
    Li, Zhiyu
    Niu, Simin
    Xiong, Feiyu
    Tang, Bo
    Wang, Wenjin
    Wu, Hao
    Liu, Huanyong
    Xu, Tong
    Chen, Enhong
    ACM TRANSACTIONS ON INFORMATION SYSTEMS, 2025, 43 (02)
  • [28] Optimized interaction with Large Language Models: A practical guide to Prompt Engineering and Retrieval-Augmented Generation
    Fink, Anna
    Rau, Alexander
    Kotter, Elmar
    Bamberg, Fabian
    Russe, Maximilian Frederik
    RADIOLOGIE, 2025,
  • [29] Quantitative Evaluation of Using Large Language Models and Retrieval-Augmented Generation in Computer Science Education
    Wang, Kevin Shukang
    Lawrence, Ramon
    PROCEEDINGS OF THE 56TH ACM TECHNICAL SYMPOSIUM ON COMPUTER SCIENCE EDUCATION, SIGCSE TS 2025, VOL 1, 2025, : 1183 - 1189
  • [30] ChatCoT: Tool-Augmented Chain-of-Thought Reasoning on Chat-based Large Language Models
    Chen, Zhipeng
    Zhou, Kun
    Zhang, Beichen
    Gong, Zheng
    Zhao, Wayne Xin
    Wen, Ji-Rong
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (EMNLP 2023), 2023, : 14777 - 14790