共 50 条
- [11] A fine-grained GPU sharing and job scheduling for deep learning jobs on the cloud JOURNAL OF SUPERCOMPUTING, 2025, 81 (02):
- [12] Crux: GPU-Efficient Communication Scheduling for Deep Learning Training PROCEEDINGS OF THE 2024 ACM SIGCOMM 2024 CONFERENCE, ACM SIGCOMM 2024, 2024, : 1 - 15
- [13] Benchmarking Resource Usage for Efficient Distributed Deep Learning 2022 IEEE HIGH PERFORMANCE EXTREME COMPUTING VIRTUAL CONFERENCE (HPEC), 2022,
- [14] Optimus: An Efficient Dynamic Resource Scheduler for Deep Learning Clusters EUROSYS '18: PROCEEDINGS OF THE THIRTEENTH EUROSYS CONFERENCE, 2018,
- [17] Online Evolutionary Batch Size Orchestration for Scheduling Deep Learning Workloads in GPU Clusters SC21: INTERNATIONAL CONFERENCE FOR HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS, 2021,
- [18] TensorExpress: In-Network Communication Scheduling for Distributed Deep Learning 2020 IEEE 13TH INTERNATIONAL CONFERENCE ON CLOUD COMPUTING (CLOUD 2020), 2020, : 25 - 27
- [19] On Scheduling Ring-All-Reduce Learning Jobs in Multi-Tenant GPU Clusters with Communication Contention PROCEEDINGS OF THE 2022 THE TWENTY-THIRD INTERNATIONAL SYMPOSIUM ON THEORY, ALGORITHMIC FOUNDATIONS, AND PROTOCOL DESIGN FOR MOBILE NETWORKS AND MOBILE COMPUTING, MOBIHOC 2022, 2022, : 21 - 30