共 50 条
- [1] Scaling Vision-Language Models with Sparse Mixture of Experts FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (EMNLP 2023), 2023, : 11329 - 11344
- [3] Adaptive Gating in Mixture-of-Experts based Language Models 2023 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING, EMNLP 2023, 2023, : 3577 - 3587
- [4] Dialogue Summarization with Mixture of Experts based on Large Language Models PROCEEDINGS OF THE 62ND ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 1: LONG PAPERS, 2024, : 7143 - 7155
- [5] GLaM: Efficient Scaling of Language Models with Mixture-of-Experts INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
- [6] SPARSE BAYESIAN HIERARCHICAL MIXTURE OF EXPERTS 2011 IEEE STATISTICAL SIGNAL PROCESSING WORKSHOP (SSP), 2011, : 653 - 656
- [7] Scaling Vision with Sparse Mixture of Experts ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
- [8] On the Representation Collapse of Sparse Mixture of Experts ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
- [9] Harnessing the Power of Prompt Experts: Efficient Knowledge Distillation for Enhanced Language Understanding MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES-RESEARCH TRACK AND DEMO TRACK, PT VIII, ECML PKDD 2024, 2024, 14948 : 218 - 234
- [10] Janus: A Unified Distributed Training Framework for Sparse Mixture-of-Experts Models PROCEEDINGS OF THE 2023 ACM SIGCOMM 2023 CONFERENCE, SIGCOMM 2023, 2023, : 486 - 498