A Survey on Large-Scale Machine Learning

被引:50
|
作者
Wang, Meng [1 ,2 ]
Fu, Weijie [1 ,2 ]
He, Xiangnan [3 ]
Hao, Shijie [1 ,2 ]
Wu, Xindong [1 ,2 ]
机构
[1] Hefei Univ Technol, Key Lab Knowledge Engn Big Data, Minist Educ, Hefei 230601, Anhui, Peoples R China
[2] Hefei Univ Technol, Sch Comp Sci & Informat Engn, Hefei 230601, Anhui, Peoples R China
[3] Univ Sci & Technol China, Hefei 230031, Anhui, Peoples R China
关键词
Machine learning; Computational modeling; Optimization; Predictive models; Big Data; Computational complexity; Large-scale machine learning; efficient machine learning; big data analysis; efficiency; survey; GRAPH CONSTRUCTION; BIG DATA; OPTIMIZATION; ALGORITHMS;
D O I
10.1109/TKDE.2020.3015777
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Machine learning can provide deep insights into data, allowing machines to make high-quality predictions and having been widely used in real-world applications, such as text mining, visual classification, and recommender systems. However, most sophisticated machine learning approaches suffer from huge time costs when operating on large-scale data. This issue calls for the need of Large-scale Machine Learning (LML), which aims to learn patterns from big data with comparable performance efficiently. In this paper, we offer a systematic survey on existing LML methods to provide a blueprint for the future developments of this area. We first divide these LML methods according to the ways of improving the scalability: 1) model simplification on computational complexities, 2) optimization approximation on computational efficiency, and 3) computation parallelism on computational capabilities. Then we categorize the methods in each perspective according to their targeted scenarios and introduce representative methods in line with intrinsic strategies. Lastly, we analyze their limitations and discuss potential directions as well as open issues that are promising to address in the future.
引用
收藏
页码:2574 / 2594
页数:21
相关论文
共 50 条
  • [31] Angel: a new large-scale machine learning system
    Jie Jiang
    Lele Yu
    Jiawei Jiang
    Yuhong Liu
    Bin Cui
    National Science Review, 2018, 5 (02) : 216 - 236
  • [32] Towards an Optimized GROUP BY Abstraction for Large-Scale Machine Learning
    Li, Side
    Kumar, Arun
    PROCEEDINGS OF THE VLDB ENDOWMENT, 2021, 14 (11): : 2327 - 2340
  • [33] Toward Large-Scale Vulnerability Discovery using Machine Learning
    Grieco, Gustavo
    Grinblat, Guillermo Luis
    Uzal, Lucas
    Rawat, Sanjay
    Feist, Josselin
    Mounier, Laurent
    CODASPY'16: PROCEEDINGS OF THE SIXTH ACM CONFERENCE ON DATA AND APPLICATION SECURITY AND PRIVACY, 2016, : 85 - 96
  • [34] Lotus: A New Topology for Large-scale Distributed Machine Learning
    Lu, Yunfeng
    Gu, Huaxi
    Yu, Xiaoshan
    Chakrabarty, Krishnendu
    ACM JOURNAL ON EMERGING TECHNOLOGIES IN COMPUTING SYSTEMS, 2021, 17 (01)
  • [35] Conformal Prediction in Spark: Large-Scale Machine Learning with Confidence
    Capuccini, Marco
    Carlsson, Lars
    Norinder, Ulf
    Spjuth, Ola
    2015 IEEE/ACM 2ND INTERNATIONAL SYMPOSIUM ON BIG DATA COMPUTING (BDC), 2015, : 61 - 67
  • [36] Security of NVMe Offloaded Data in Large-Scale Machine Learning
    Krauss, Torsten
    Goetz, Raphael
    Dmitrienko, Alexandra
    COMPUTER SECURITY - ESORICS 2023, PT IV, 2024, 14347 : 143 - 163
  • [37] Configuring large-scale storage using a middleware with machine learning
    Eyers, David M.
    Routray, Ramani
    Zhang, Rui
    Willcocks, Douglas
    Pietzuch, Peter
    CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2011, 23 (17): : 2063 - 2077
  • [38] Painless Stochastic Conjugate Gradient for Large-Scale Machine Learning
    Yang, Zhuang
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2023, 35 (10) : 1 - 14
  • [39] A machine learning software for large-scale molecular and clinical data
    Pan, L.
    Mikolajczyk, K.
    Dimitrakopoulou-Strauss, A.
    Burger, C.
    Strauss, L.
    EUROPEAN JOURNAL OF NUCLEAR MEDICINE AND MOLECULAR IMAGING, 2007, 34 : S343 - S343
  • [40] Revisiting the Nystrom Method for Improved Large-scale Machine Learning
    Gittens, Alex
    Mahoney, Michael W.
    JOURNAL OF MACHINE LEARNING RESEARCH, 2016, 17