Distributed Computing and Inference for Big Data

被引:1
|
作者
Zhou, Ling [1 ,2 ]
Gong, Ziyang [1 ,2 ]
Xiang, Pengcheng [1 ,2 ]
机构
[1] Southwestern Univ Finance & Econ, Ctr Stat Res, Chengdu, Peoples R China
[2] Southwestern Univ Finance & Econ, Sch Stat, Chengdu, Peoples R China
基金
中国国家自然科学基金; 国家重点研发计划;
关键词
communication efficiency; distributed learning; federated learning; heterogeneity; statistical equivalence; DIVIDE-AND-CONQUER; CONVERGENCE; ALGORITHMS; EFFICIENCY; FRAMEWORK;
D O I
10.1146/annurev-statistics-040522-021241
中图分类号
O1 [数学];
学科分类号
0701 ; 070101 ;
摘要
Data are distributed across different sites due to computing facility limitations or data privacy considerations. Conventional centralized methods-those in which all datasets are stored and processed in a central computing facility-are not applicable in practice. Therefore, it has become necessary to develop distributed learning approaches that have good inference or predictive accuracy while remaining free of individual data or obeying policies and regulations to protect privacy. In this article, we introduce the basic idea of distributed learning and conduct a selected review on various distributed learning methods, which are categorized by their statistical accuracy, computational efficiency, heterogeneity, and privacy. This categorization can help evaluate newly proposed methods from different aspects. Moreover, we provide up-to-date descriptions of the existing theoretical results that cover statistical equivalency and computational efficiency under different statistical learning frameworks. Finally, we provide existing software implementations and benchmark datasets, and we discuss future research opportunities.
引用
收藏
页码:533 / 551
页数:19
相关论文
共 50 条
  • [21] Support vector machine in big data: smoothing strategy and adaptive distributed inference
    Wang, Kangning
    Liu, Jin
    Sun, Xiaofei
    STATISTICS AND COMPUTING, 2024, 34 (06)
  • [22] NSF/IEEE-TCPP Curriculum on Parallel and Distributed Computing for Undergraduates - Version II - Big Data, Energy, and Distributed Computing
    Prasad, Sushil
    Weems, Charles
    Sussman, Alan
    Gupta, Anshul
    Estrada, Trilce
    Vaidyanathan, Ramachandran
    Ghafoor, Sheikh
    Kant, Krishna
    Stunkel, Craig
    PROCEEDINGS OF THE 54TH ACM TECHNICAL SYMPOSIUM ON COMPUTER SCIENCE EDUCATION, VOL 2, SIGCSE 2023, 2023, : 1220 - 1221
  • [23] Resilient Distributed Computing Platforms for Big Data Analysis Using Spark and Hadoop
    Chang, Bao Rong
    Tsai, Hsiu-Fen
    Wang, Yo-Ai
    Huang, Chien-Feng
    PROCEEDINGS OF 2016 INTERNATIONAL CONFERENCE ON APPLIED SYSTEM INNOVATION (ICASI), 2016,
  • [24] A Distributed Collaborative Urban Traffic Big Data System Based on Cloud Computing
    Zhang, Jianqin
    Chen, Zhihong
    Xu, Zhijie
    Du, Mingyi
    Yang, Weijun
    Guo, Liang
    IEEE INTELLIGENT TRANSPORTATION SYSTEMS MAGAZINE, 2019, 11 (04) : 37 - 47
  • [25] Multi-Stage Distributed Computing for Big Data: Evaluating Connective Topologies
    Gargees, Rasha S.
    Scott, Grant J.
    2020 10TH ANNUAL COMPUTING AND COMMUNICATION WORKSHOP AND CONFERENCE (CCWC), 2020, : 626 - 633
  • [26] A distributed computing framework for wind speed big data forecasting on Apache Spark
    Xu, Yinan
    Liu, Hui
    Long, Zhihao
    SUSTAINABLE ENERGY TECHNOLOGIES AND ASSESSMENTS, 2020, 37
  • [27] Accelerating Network Resource Allocation in LoRaWAN via Distributed Big Data Computing
    Spadaccino, Pietro
    Garlisi, Domenico
    Franceschi, Andrea
    Tinnirello, Ilenia
    Cuomo, Francesca
    IEEE ACCESS, 2024, 12 : 141237 - 141250
  • [28] Next Generation Workload Management System For Big Data on Heterogeneous Distributed Computing
    Klimentov, A.
    Buncic, P.
    De, K.
    Jha, S.
    Maeno, T.
    Mount, R.
    Nilsson, P.
    Oleynik, D.
    Panitkin, S.
    Petrosyan, A.
    Porter, R. J.
    Read, K. F.
    Vaniachine, A.
    Wells, J. C.
    Wenaus, T.
    16TH INTERNATIONAL WORKSHOP ON ADVANCED COMPUTING AND ANALYSIS TECHNIQUES IN PHYSICS RESEARCH (ACAT2014), 2015, 608
  • [29] BAYESIAN ANALYSIS OF BIG DATA IN INSURANCE PREDICTIVE MODELING USING DISTRIBUTED COMPUTING
    Zhang, Yanwei
    ASTIN BULLETIN, 2017, 47 (03): : 943 - 961
  • [30] Design and implementation of reconfigurable acceleration for in-memory distributed big data computing
    Hou, Junjie
    Zhu, Yongxin
    Du, Sen
    Song, Shijin
    FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2019, 92 : 68 - 75