Powering Multi-Task Federated Learning with Competitive GPU Resource Sharing

被引:0
|
作者
Yu, Yongbo [1 ]
Yu, Fuxun [1 ]
Xu, Zirui [1 ]
Wang, Di [2 ]
Zhang, Mingjia [2 ]
Li, Ang [3 ]
Bray, Shawn [4 ]
Liu, Chenchen [4 ]
Chen, Xiang [1 ]
机构
[1] George Mason Univ, Fairfax, VA 22030 USA
[2] Microsoft, Redmond, WA USA
[3] Duke Univ, Durham, NC USA
[4] Univ Maryland, Baltimore, MD USA
关键词
Federated Learning; Multi-Task Learning; GPU Resource Allocation;
D O I
10.1145/3487553.3524859
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Federated learning (FL) nowadays involves compound learning tasks as cognitive applications' complexity increases. For example, a self-driving system hosts multiple tasks simultaneously (e.g., detection, classification, etc.) and expects FL to retain life-long intelligence involvement. However, our analysis demonstrates that, when deploying compound FL models for multiple training tasks on a GPU, certain issues arise: (1) As different tasks' skewed data distributions and corresponding models cause highly imbalanced learning workloads, current GPU scheduling methods lack effective resource allocations; (2) Therefore, existing FL schemes, only focusing on heterogeneous data distribution but runtime computing, cannot practically achieve optimally synchronized federation. To address these issues, we propose a full-stack FL optimization scheme to address both intra-device GPU scheduling and inter-device FL co-ordination for multi-task training. Specifically, our works illustrate two key insights in this research domain: (1) Competitive resource sharing is beneficial for parallel model executions, and the proposed concept of "virtual resource" could effectively characterize and guide the practical per-task resource utilization and allocation. (2) FL could be further improved by taking architectural level coordination into consideration. Our experiments demonstrate that the FL throughput could be significantly escalated.
引用
收藏
页码:567 / 571
页数:5
相关论文
共 50 条
  • [1] Federated Multi-Task Learning
    Smith, Virginia
    Chiang, Chao-Kai
    Sanjabi, Maziar
    Talwalkar, Ameet
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 30 (NIPS 2017), 2017, 30
  • [2] Communication-Efficient Federated Multi-Task Learning with Sparse Sharing
    Ai, Yuhan
    Chen, Qimei
    Liang, Yipeng
    Jiang, Hao
    [J]. 2023 IEEE 34TH ANNUAL INTERNATIONAL SYMPOSIUM ON PERSONAL, INDOOR AND MOBILE RADIO COMMUNICATIONS, PIMRC, 2023,
  • [3] Federated Multi-task Graph Learning
    Liu, Yijing
    Han, Dongming
    Zhang, Jianwei
    Zhu, Haiyang
    Xu, Mingliang
    Chen, Wei
    [J]. ACM TRANSACTIONS ON INTELLIGENT SYSTEMS AND TECHNOLOGY, 2022, 13 (05)
  • [4] Multi-Task Learning Resource Allocation in Federated Integrated Sensing and Communication Networks
    Liu, Xiangnan
    Zhang, Haijun
    Ren, Chao
    Li, Haojin
    Sun, Chen
    Leung, Victor C. M.
    [J]. IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS, 2024, 23 (09) : 11612 - 11623
  • [5] Multi-Tenant Deep Learning Acceleration with Competitive GPU Resource Sharing
    Yu, Yongbo
    Chen, Xiang
    [J]. 2023 IEEE CLOUD SUMMIT, 2023, : 49 - 51
  • [6] Fitting and sharing multi-task learning
    Piao, Chengkai
    Wei, Jinmao
    [J]. APPLIED INTELLIGENCE, 2024, 54 (9-10) : 6918 - 6929
  • [7] HFedMTL: Hierarchical Federated Multi-Task Learning
    Yi, Xingfu
    Li, Rongpeng
    Peng, Chenghui
    Wu, Jianjun
    Zhao, Zhifeng
    [J]. 2022 IEEE 33RD ANNUAL INTERNATIONAL SYMPOSIUM ON PERSONAL, INDOOR AND MOBILE RADIO COMMUNICATIONS (IEEE PIMRC), 2022,
  • [8] DAG-based Scheduling with Resource Sharing for Multi-task Applications in a Polyglot GPU Runtime
    Parravicini, Alberto
    Delamare, Arnaud
    Arnaboldi, Marco
    Santambrogio, Marco D.
    [J]. 2021 IEEE 35TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM (IPDPS), 2021, : 111 - 120
  • [9] DAG-based scheduling with resource sharing for multi-task applications in a polyglot GPU runtime
    Politecnico di Milano, Milan, Italy
    不详
    [J]. Proc. - IEEE Int. Parallel Distrib. Process. Symp., IPDPS, 1600, (111-120):
  • [10] Task Adaptive Parameter Sharing for Multi-Task Learning
    Wallingford, Matthew
    Li, Hao
    Achille, Alessandro
    Ravichandran, Avinash
    Fowlkes, Charless
    Bhotika, Rahul
    Soatto, Stefano
    [J]. 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, : 7551 - 7560