Powering Multi-Task Federated Learning with Competitive GPU Resource Sharing

被引：0

作者：

Yu, Yongbo ^{[1
]}

Yu, Fuxun ^{[1
]}

Xu, Zirui ^{[1
]}

Wang, Di ^{[2
]}

Zhang, Mingjia ^{[2
]}

Li, Ang ^{[3
]}

Bray, Shawn ^{[4
]}

Liu, Chenchen ^{[4
]}

Chen, Xiang ^{[1
]}

机构：

[1] George Mason Univ, Fairfax, VA 22030 USA

[2] Microsoft, Redmond, WA USA

[3] Duke Univ, Durham, NC USA

[4] Univ Maryland, Baltimore, MD USA

来源：

COMPANION PROCEEDINGS OF THE WEB CONFERENCE 2022, WWW 2022 COMPANION | 2022年

关键词：

Federated Learning; Multi-Task Learning; GPU Resource Allocation;

D O I：

10.1145/3487553.3524859

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Federated learning (FL) nowadays involves compound learning tasks as cognitive applications' complexity increases. For example, a self-driving system hosts multiple tasks simultaneously (e.g., detection, classification, etc.) and expects FL to retain life-long intelligence involvement. However, our analysis demonstrates that, when deploying compound FL models for multiple training tasks on a GPU, certain issues arise: (1) As different tasks' skewed data distributions and corresponding models cause highly imbalanced learning workloads, current GPU scheduling methods lack effective resource allocations; (2) Therefore, existing FL schemes, only focusing on heterogeneous data distribution but runtime computing, cannot practically achieve optimally synchronized federation. To address these issues, we propose a full-stack FL optimization scheme to address both intra-device GPU scheduling and inter-device FL co-ordination for multi-task training. Specifically, our works illustrate two key insights in this research domain: (1) Competitive resource sharing is beneficial for parallel model executions, and the proposed concept of "virtual resource" could effectively characterize and guide the practical per-task resource utilization and allocation. (2) FL could be further improved by taking architectural level coordination into consideration. Our experiments demonstrate that the FL throughput could be significantly escalated.

引用

页码：567 / 571

页数：5

共 50 条

[1] Federated Multi-Task Learning
Smith, Virginia
Chiang, Chao-Kai
Sanjabi, Maziar
Talwalkar, Ameet
[J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 30 (NIPS 2017), 2017, 30
[2] Communication-Efficient Federated Multi-Task Learning with Sparse Sharing
Ai, Yuhan
Chen, Qimei
Liang, Yipeng
Jiang, Hao
[J]. 2023 IEEE 34TH ANNUAL INTERNATIONAL SYMPOSIUM ON PERSONAL, INDOOR AND MOBILE RADIO COMMUNICATIONS, PIMRC, 2023,
[3] Federated Multi-task Graph Learning
Liu, Yijing
Han, Dongming
Zhang, Jianwei
Zhu, Haiyang
Xu, Mingliang
Chen, Wei
[J]. ACM TRANSACTIONS ON INTELLIGENT SYSTEMS AND TECHNOLOGY, 2022, 13 (05)
[4] Multi-Task Learning Resource Allocation in Federated Integrated Sensing and Communication Networks
Liu, Xiangnan
Zhang, Haijun
Ren, Chao
Li, Haojin
Sun, Chen
Leung, Victor C. M.
[J]. IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS, 2024, 23 (09) : 11612 - 11623
[5] Multi-Tenant Deep Learning Acceleration with Competitive GPU Resource Sharing
Yu, Yongbo
Chen, Xiang
[J]. 2023 IEEE CLOUD SUMMIT, 2023, : 49 - 51
[6] Fitting and sharing multi-task learning
Piao, Chengkai
Wei, Jinmao
[J]. APPLIED INTELLIGENCE, 2024, 54 (9-10) : 6918 - 6929
[7] HFedMTL: Hierarchical Federated Multi-Task Learning
Yi, Xingfu
Li, Rongpeng
Peng, Chenghui
Wu, Jianjun
Zhao, Zhifeng
[J]. 2022 IEEE 33RD ANNUAL INTERNATIONAL SYMPOSIUM ON PERSONAL, INDOOR AND MOBILE RADIO COMMUNICATIONS (IEEE PIMRC), 2022,
[8] DAG-based Scheduling with Resource Sharing for Multi-task Applications in a Polyglot GPU Runtime
Parravicini, Alberto
Delamare, Arnaud
Arnaboldi, Marco
Santambrogio, Marco D.
[J]. 2021 IEEE 35TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM (IPDPS), 2021, : 111 - 120
[9] DAG-based scheduling with resource sharing for multi-task applications in a polyglot GPU runtime
Politecnico di Milano, Milan, Italy
不详
[J]. Proc. - IEEE Int. Parallel Distrib. Process. Symp., IPDPS, 1600, (111-120):
[10] Task Adaptive Parameter Sharing for Multi-Task Learning
Wallingford, Matthew
Li, Hao
Achille, Alessandro
Ravichandran, Avinash
Fowlkes, Charless
Bhotika, Rahul
Soatto, Stefano
[J]. 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, : 7551 - 7560

← 1 2 3 4 5 →