Efficient Inference Offloading for Mixture-of-Experts Large Language Models in Internet of Medical Things

被引:1
|
作者
Yuan, Xiaoming [1 ,2 ]
Kong, Weixuan [1 ]
Luo, Zhenyu [1 ]
Xu, Minrui [3 ]
机构
[1] Northeastern Univ Qinhuangdao, Hebei Key Lab Marine Percept Network & Data Proc, Qinhuangdao 066004, Peoples R China
[2] Xidian Univ, State Key Lab Integrated Serv Networks, Xian 710071, Peoples R China
[3] Nanyang Technol Univ, Sch Comp Sci & Engn, Singapore 639798, Singapore
基金
中国国家自然科学基金;
关键词
large language models; efficient inference offloading; mixture-of-experts; Internet of Medical Things;
D O I
10.3390/electronics13112077
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Despite recent significant advancements in large language models (LLMs) for medical services, the deployment difficulties of LLMs in e-healthcare hinder complex medical applications in the Internet of Medical Things (IoMT). People are increasingly concerned about e-healthcare risks and privacy protection. Existing LLMs face difficulties in providing accurate medical questions and answers (Q&As) and meeting the deployment resource demands in the IoMT. To address these challenges, we propose MedMixtral 8x7B, a new medical LLM based on the mixture-of-experts (MoE) architecture with an offloading strategy, enabling deployment on the IoMT, improving the privacy protection for users. Additionally, we find that the significant factors affecting latency include the method of device interconnection, the location of offloading servers, and the speed of the disk.
引用
收藏
页数:17
相关论文
共 50 条
  • [1] GLaM: Efficient Scaling of Language Models with Mixture-of-Experts
    Du, Nan
    Huang, Yanping
    Dai, Andrew M.
    Tong, Simon
    Lepikhin, Dmitry
    Xu, Yuanzhong
    Krikun, Maxim
    Zhou, Yanqi
    Yu, Adams Wei
    Firat, Orhan
    Zoph, Barret
    Fedus, Liam
    Bosma, Maarten
    Zhou, Zongwei
    Wang, Tao
    Wang, Yu Emma
    Webster, Kellie
    Pellat, Marie
    Robinson, Kevin
    Meier-Hellstern, Kathleen
    Duke, Toju
    Dixon, Lucas
    Zhang, Kun
    Le, Quoc V.
    Wu, Yonghui
    Chen, Zhifeng
    Cui, Claire
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
  • [2] MoESys: A Distributed and Efficient Mixture-of-Experts Training and Inference System for Internet Services
    Yu D.
    Shen L.
    Hao H.
    Gong W.
    Wu H.
    Bian J.
    Dai L.
    Xiong H.
    IEEE Transactions on Services Computing, 2024, 17 (05): : 1 - 15
  • [3] Beyond Distillation: Task-level Mixture-of-Experts for Efficient Inference
    Kudugunta, Sneha
    Huang, Yanping
    Bapna, Ankur
    Krikun, Maxim
    Lepikhin, Dmitry
    Thang Luong
    Firat, Orhan
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2021, 2021, : 3577 - 3599
  • [4] Efficient Routing in Sparse Mixture-of-Experts
    Shamsolmoali, Pourya (pshams55@gmail.com), 1600, Institute of Electrical and Electronics Engineers Inc.
  • [5] Asymptotic properties of mixture-of-experts models
    Olteanu, M.
    Rynkiewicz, J.
    NEUROCOMPUTING, 2011, 74 (09) : 1444 - 1449
  • [6] A mixture-of-experts approach for gene regulatory network inference
    Shao, Borong
    Lavesson, Niklas
    Boeva, Veselka
    Shahzad, Raja Khurram
    INTERNATIONAL JOURNAL OF DATA MINING AND BIOINFORMATICS, 2016, 14 (03) : 258 - 275
  • [7] A Universal Approximation Theorem for Mixture-of-Experts Models
    Nguyen, Hien D.
    Lloyd-Jones, Luke R.
    McLachlan, Geoffrey J.
    NEURAL COMPUTATION, 2016, 28 (12) : 2585 - 2593
  • [8] Breaking the gridlock in Mixture-of-Experts: Consistent and Efficient Algorithms
    Makkuva, Ashok Vardhan
    Oh, Sewoong
    Kannan, Sreeram
    Viswanath, Pramod
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 97, 2019, 97
  • [9] Efficient Reflectance Capture With a Deep Gated Mixture-of-Experts
    Ma, Xiaohe
    Yu, Yaxin
    Wu, Hongzhi
    Zhou, Kun
    IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, 2024, 30 (07) : 4246 - 4256
  • [10] New estimation and feature selection methods in mixture-of-experts models
    Khalili, Abbas
    CANADIAN JOURNAL OF STATISTICS-REVUE CANADIENNE DE STATISTIQUE, 2010, 38 (04): : 519 - 539