Distributed Computing and Inference for Big Data

被引:1
|
作者
Zhou, Ling [1 ,2 ]
Gong, Ziyang [1 ,2 ]
Xiang, Pengcheng [1 ,2 ]
机构
[1] Southwestern Univ Finance & Econ, Ctr Stat Res, Chengdu, Peoples R China
[2] Southwestern Univ Finance & Econ, Sch Stat, Chengdu, Peoples R China
基金
中国国家自然科学基金; 国家重点研发计划;
关键词
communication efficiency; distributed learning; federated learning; heterogeneity; statistical equivalence; DIVIDE-AND-CONQUER; CONVERGENCE; ALGORITHMS; EFFICIENCY; FRAMEWORK;
D O I
10.1146/annurev-statistics-040522-021241
中图分类号
O1 [数学];
学科分类号
0701 ; 070101 ;
摘要
Data are distributed across different sites due to computing facility limitations or data privacy considerations. Conventional centralized methods-those in which all datasets are stored and processed in a central computing facility-are not applicable in practice. Therefore, it has become necessary to develop distributed learning approaches that have good inference or predictive accuracy while remaining free of individual data or obeying policies and regulations to protect privacy. In this article, we introduce the basic idea of distributed learning and conduct a selected review on various distributed learning methods, which are categorized by their statistical accuracy, computational efficiency, heterogeneity, and privacy. This categorization can help evaluate newly proposed methods from different aspects. Moreover, we provide up-to-date descriptions of the existing theoretical results that cover statistical equivalency and computational efficiency under different statistical learning frameworks. Finally, we provide existing software implementations and benchmark datasets, and we discuss future research opportunities.
引用
收藏
页码:533 / 551
页数:19
相关论文
共 50 条
  • [31] Intelligent cryptography approach for secure distributed big data storage in cloud computing
    Li, Yibin
    Gai, Keke
    Qiu, Longfei
    Qiu, Meikang
    Zhao, Hui
    INFORMATION SCIENCES, 2017, 387 : 103 - 115
  • [32] ClimateSpark: An in-memory distributed computing framework for big climate data analytics
    Hu, Fei
    Yang, Chaowei
    Schnase, John L.
    Duffy, Daniel Q.
    Xu, Mengchao
    Bowen, Michael K.
    Lee, Tsengdar
    Song, Weiwei
    COMPUTERS & GEOSCIENCES, 2018, 115 : 154 - 166
  • [33] Recent Developments in Parallel and Distributed Computing for Remotely Sensed Big Data Processing
    Wu, Zebin
    Sun, Jin
    Zhang, Yi
    Wei, Zhihui
    Chanussot, Jocelyn
    PROCEEDINGS OF THE IEEE, 2021, 109 (08) : 1282 - 1305
  • [34] Distributed Private Online Learning for Social Big Data Computing over Data Center Networks
    Li, Chencheng
    Zhou, Pan
    Zhou, Yingxue
    Bian, Kaigui
    Jiang, Tao
    Rahardja, Susanto
    2016 IEEE INTERNATIONAL CONFERENCE ON COMMUNICATIONS (ICC), 2016, : 392 - 397
  • [35] Distributed computing and big data techniques for efficient fault detection and data management in wireless networks
    Kiran, Ajmeera
    Renjith, P. N.
    Gupta, Sapna
    Ambala, Srinivas
    Raju, Preethi Sambandam
    Sriramsetti, Drakshayani
    OPTICAL AND QUANTUM ELECTRONICS, 2023, 55 (13)
  • [36] E-commerce big data computing platform system based on distributed computing logistics information
    Junmin Hu
    Cluster Computing, 2019, 22 : 13693 - 13702
  • [37] E-commerce big data computing platform system based on distributed computing logistics information
    Hu, Junmin
    CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS, 2019, 22 (Suppl 6): : 13693 - 13702
  • [38] Smooth quantile regression and distributed inference for non-randomly stored big data
    Wang, Kangning
    Jia, Jiaojiao
    Polat, Kemal
    Sun, Xiaofei
    Alhudhaif, Adi
    Alenezi, Fayadh
    EXPERT SYSTEMS WITH APPLICATIONS, 2023, 215
  • [39] Application Of Cloud Computing In Biomedicine Big Data Analysis Cloud Computing In Big Data
    Yang, Tianyi
    Zhao, Yang
    2017 INTERNATIONAL CONFERENCE ON ALGORITHMS, METHODOLOGY, MODELS AND APPLICATIONS IN EMERGING TECHNOLOGIES (ICAMMAET), 2017,
  • [40] Big Data Analytics in Telecommunication using state-of-the-art Big Data Framework in a Distributed Computing Environment: A Case Study
    Ved, Mohit
    Rizwanahmed, B.
    2019 IEEE 43RD ANNUAL COMPUTER SOFTWARE AND APPLICATIONS CONFERENCE (COMPSAC), VOL 1, 2019, : 411 - 416