共 50 条
- [1] A Universal Approximation Theorem for Mixture-of-Experts Models [J]. NEURAL COMPUTATION, 2016, 28 (12) : 2585 - 2593
- [2] Spatial Mixture-of-Experts [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
- [3] GLaM: Efficient Scaling of Language Models with Mixture-of-Experts [J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
- [4] New estimation and feature selection methods in mixture-of-experts models [J]. CANADIAN JOURNAL OF STATISTICS-REVUE CANADIENNE DE STATISTIQUE, 2010, 38 (04): : 519 - 539
- [7] Mixture-of-Experts with Expert Choice Routing [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35, NEURIPS 2022, 2022,
- [8] Janus: A Unified Distributed Training Framework for Sparse Mixture-of-Experts Models [J]. PROCEEDINGS OF THE 2023 ACM SIGCOMM 2023 CONFERENCE, SIGCOMM 2023, 2023, : 486 - 498
- [9] Efficient Routing in Sparse Mixture-of-Experts [J]. Shamsolmoali, Pourya (pshams55@gmail.com), 1600, Institute of Electrical and Electronics Engineers Inc.
- [10] MoDE: A Mixture-of-Experts Model with Mutual Distillation among the Experts [J]. THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 14, 2024, : 16067 - 16075