Distributed Computing and Inference for Big Data

被引:1
|
作者
Zhou, Ling [1 ,2 ]
Gong, Ziyang [1 ,2 ]
Xiang, Pengcheng [1 ,2 ]
机构
[1] Southwestern Univ Finance & Econ, Ctr Stat Res, Chengdu, Peoples R China
[2] Southwestern Univ Finance & Econ, Sch Stat, Chengdu, Peoples R China
基金
中国国家自然科学基金; 国家重点研发计划;
关键词
communication efficiency; distributed learning; federated learning; heterogeneity; statistical equivalence; DIVIDE-AND-CONQUER; CONVERGENCE; ALGORITHMS; EFFICIENCY; FRAMEWORK;
D O I
10.1146/annurev-statistics-040522-021241
中图分类号
O1 [数学];
学科分类号
0701 ; 070101 ;
摘要
Data are distributed across different sites due to computing facility limitations or data privacy considerations. Conventional centralized methods-those in which all datasets are stored and processed in a central computing facility-are not applicable in practice. Therefore, it has become necessary to develop distributed learning approaches that have good inference or predictive accuracy while remaining free of individual data or obeying policies and regulations to protect privacy. In this article, we introduce the basic idea of distributed learning and conduct a selected review on various distributed learning methods, which are categorized by their statistical accuracy, computational efficiency, heterogeneity, and privacy. This categorization can help evaluate newly proposed methods from different aspects. Moreover, we provide up-to-date descriptions of the existing theoretical results that cover statistical equivalency and computational efficiency under different statistical learning frameworks. Finally, we provide existing software implementations and benchmark datasets, and we discuss future research opportunities.
引用
收藏
页码:533 / 551
页数:19
相关论文
共 50 条
  • [41] Selective Inference with Distributed Data
    Liu, Sifan
    Panigrahi, Snigdha
    JOURNAL OF MACHINE LEARNING RESEARCH, 2025, 26
  • [42] Inference in distributed data clustering
    da Silva, Josenildo Costa
    Klusch, Matthias
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2006, 19 (04) : 363 - 369
  • [43] Inference on distributed data clustering
    da Silva, JC
    Klusch, M
    MACHINE LEARNING AND DATA MINING IN PATTERN RECOGNITION, PROCEEDINGS, 2005, 3587 : 610 - 619
  • [44] Cloud Computing and Big Data
    Hsu, Ching-Hsien
    Tang, Chunming
    Esteves, Rui M.
    JOURNAL OF INTERNET TECHNOLOGY, 2014, 15 (06): : 995 - 997
  • [45] Big data and cloud computing
    Shrestha, Rasu B.
    APPLIED RADIOLOGY, 2014, 43 (03) : 32 - 34
  • [46] Multimedia Big Data Computing
    Zhu, Wenwu
    Cui, Peng
    Wang, Zhi
    Hua, Gang
    IEEE MULTIMEDIA, 2015, 22 (03) : 96 - 105
  • [47] Exascale Computing and Big Data
    Reed, Daniel A.
    Dongarra, Jack
    COMMUNICATIONS OF THE ACM, 2015, 58 (07) : 56 - 68
  • [48] The anatomy of big data computing
    Kune, Raghavendra
    Konugurthi, Pramod Kumar
    Agarwal, Arun
    Chillarige, Raghavendra Rao
    Buyya, Rajkumar
    SOFTWARE-PRACTICE & EXPERIENCE, 2016, 46 (01): : 79 - 105
  • [49] Data Optimised Computing for Heterogeneous Big Data Computing Applications
    Yang, Erica
    Ross, Derek
    Nagella, Srikanth
    Turner, Martin
    Kockelmann, Winfried
    Burca, Genoveva
    Pouzols, Federico Montesino
    PROCEEDINGS 2015 IEEE INTERNATIONAL CONFERENCE ON BIG DATA, 2015, : 2817 - 2819
  • [50] An Extensible Approach to Searching and Selecting Data Sources for Materialized Big Data Integration in Distributed Computing Environments
    Sazontev, V. V.
    Stupnikov, S. A.
    PATTERN RECOGNITION AND IMAGE ANALYSIS, 2023, 33 (02) : 147 - 156