Efficient Inference Offloading for Mixture-of-Experts Large Language Models in Internet of Medical Things

被引：1

作者：

Yuan, Xiaoming ^{[1
,2
]}

Kong, Weixuan ^{[1
]}

Luo, Zhenyu ^{[1
]}

Xu, Minrui ^{[3
]}

机构：

[1] Northeastern Univ Qinhuangdao, Hebei Key Lab Marine Percept Network & Data Proc, Qinhuangdao 066004, Peoples R China

[2] Xidian Univ, State Key Lab Integrated Serv Networks, Xian 710071, Peoples R China

[3] Nanyang Technol Univ, Sch Comp Sci & Engn, Singapore 639798, Singapore

来源：

ELECTRONICS | 2024年 / 13卷 / 11期

基金：

中国国家自然科学基金;

关键词：

large language models; efficient inference offloading; mixture-of-experts; Internet of Medical Things;

D O I：

10.3390/electronics13112077

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Despite recent significant advancements in large language models (LLMs) for medical services, the deployment difficulties of LLMs in e-healthcare hinder complex medical applications in the Internet of Medical Things (IoMT). People are increasingly concerned about e-healthcare risks and privacy protection. Existing LLMs face difficulties in providing accurate medical questions and answers (Q&As) and meeting the deployment resource demands in the IoMT. To address these challenges, we propose MedMixtral 8x7B, a new medical LLM based on the mixture-of-experts (MoE) architecture with an offloading strategy, enabling deployment on the IoMT, improving the privacy protection for users. Additionally, we find that the significant factors affecting latency include the method of device interconnection, the location of offloading servers, and the speed of the disk.

引用

页数：17

共 50 条

[1] GLaM: Efficient Scaling of Language Models with Mixture-of-Experts
Du, Nan
Huang, Yanping
Dai, Andrew M.
Tong, Simon
Lepikhin, Dmitry
Xu, Yuanzhong
Krikun, Maxim
Zhou, Yanqi
Yu, Adams Wei
Firat, Orhan
Zoph, Barret
Fedus, Liam
Bosma, Maarten
Zhou, Zongwei
Wang, Tao
Wang, Yu Emma
Webster, Kellie
Pellat, Marie
Robinson, Kevin
Meier-Hellstern, Kathleen
Duke, Toju
Dixon, Lucas
Zhang, Kun
Le, Quoc V.
Wu, Yonghui
Chen, Zhifeng
Cui, Claire
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
[2] MoESys: A Distributed and Efficient Mixture-of-Experts Training and Inference System for Internet Services
Yu D.
Shen L.
Hao H.
Gong W.
Wu H.
Bian J.
Dai L.
Xiong H.
IEEE Transactions on Services Computing, 2024, 17 (05): : 1 - 15
[3] Beyond Distillation: Task-level Mixture-of-Experts for Efficient Inference
Kudugunta, Sneha
Huang, Yanping
Bapna, Ankur
Krikun, Maxim
Lepikhin, Dmitry
Thang Luong
Firat, Orhan
FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2021, 2021, : 3577 - 3599
[4] Efficient Routing in Sparse Mixture-of-Experts
Shamsolmoali, Pourya (pshams55@gmail.com), 1600, Institute of Electrical and Electronics Engineers Inc.
[5] Asymptotic properties of mixture-of-experts models
Olteanu, M.
Rynkiewicz, J.
NEUROCOMPUTING, 2011, 74 (09) : 1444 - 1449
[6] A mixture-of-experts approach for gene regulatory network inference
Shao, Borong
Lavesson, Niklas
Boeva, Veselka
Shahzad, Raja Khurram
INTERNATIONAL JOURNAL OF DATA MINING AND BIOINFORMATICS, 2016, 14 (03) : 258 - 275
[7] A Universal Approximation Theorem for Mixture-of-Experts Models
Nguyen, Hien D.
Lloyd-Jones, Luke R.
McLachlan, Geoffrey J.
NEURAL COMPUTATION, 2016, 28 (12) : 2585 - 2593
[8] Breaking the gridlock in Mixture-of-Experts: Consistent and Efficient Algorithms
Makkuva, Ashok Vardhan
Oh, Sewoong
Kannan, Sreeram
Viswanath, Pramod
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 97, 2019, 97
[9] Efficient Reflectance Capture With a Deep Gated Mixture-of-Experts
Ma, Xiaohe
Yu, Yaxin
Wu, Hongzhi
Zhou, Kun
IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, 2024, 30 (07) : 4246 - 4256
[10] New estimation and feature selection methods in mixture-of-experts models
Khalili, Abbas
CANADIAN JOURNAL OF STATISTICS-REVUE CANADIENNE DE STATISTIQUE, 2010, 38 (04): : 519 - 539

← 1 2 3 4 5 →