A Survey on Large-Scale Machine Learning

被引:50
|
作者
Wang, Meng [1 ,2 ]
Fu, Weijie [1 ,2 ]
He, Xiangnan [3 ]
Hao, Shijie [1 ,2 ]
Wu, Xindong [1 ,2 ]
机构
[1] Hefei Univ Technol, Key Lab Knowledge Engn Big Data, Minist Educ, Hefei 230601, Anhui, Peoples R China
[2] Hefei Univ Technol, Sch Comp Sci & Informat Engn, Hefei 230601, Anhui, Peoples R China
[3] Univ Sci & Technol China, Hefei 230031, Anhui, Peoples R China
关键词
Machine learning; Computational modeling; Optimization; Predictive models; Big Data; Computational complexity; Large-scale machine learning; efficient machine learning; big data analysis; efficiency; survey; GRAPH CONSTRUCTION; BIG DATA; OPTIMIZATION; ALGORITHMS;
D O I
10.1109/TKDE.2020.3015777
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Machine learning can provide deep insights into data, allowing machines to make high-quality predictions and having been widely used in real-world applications, such as text mining, visual classification, and recommender systems. However, most sophisticated machine learning approaches suffer from huge time costs when operating on large-scale data. This issue calls for the need of Large-scale Machine Learning (LML), which aims to learn patterns from big data with comparable performance efficiently. In this paper, we offer a systematic survey on existing LML methods to provide a blueprint for the future developments of this area. We first divide these LML methods according to the ways of improving the scalability: 1) model simplification on computational complexities, 2) optimization approximation on computational efficiency, and 3) computation parallelism on computational capabilities. Then we categorize the methods in each perspective according to their targeted scenarios and introduce representative methods in line with intrinsic strategies. Lastly, we analyze their limitations and discuss potential directions as well as open issues that are promising to address in the future.
引用
收藏
页码:2574 / 2594
页数:21
相关论文
共 50 条
  • [1] Machine Learning and Deep Learning frameworks and libraries for large-scale data mining: a survey
    Giang Nguyen
    Stefan Dlugolinsky
    Martin Bobák
    Viet Tran
    Álvaro López García
    Ignacio Heredia
    Peter Malík
    Ladislav Hluchý
    Artificial Intelligence Review, 2019, 52 : 77 - 124
  • [2] Machine Learning and Deep Learning frameworks and libraries for large-scale data mining: a survey
    Nguyen, Giang
    Dlugolinsky, Stefan
    Bobak, Martin
    Viet Tran
    Lopez Garcia, Alvaro
    Heredia, Ignacio
    Malik, Peter
    Hluchy, Ladislav
    ARTIFICIAL INTELLIGENCE REVIEW, 2019, 52 (01) : 77 - 124
  • [3] Efficient Machine Learning On Large-Scale Graphs
    Erickson, Parker
    Lee, Victor E.
    Shi, Feng
    Tang, Jiliang
    PROCEEDINGS OF THE 28TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, KDD 2022, 2022, : 4788 - 4789
  • [4] Large-scale kernel extreme learning machine
    Deng, Wan-Yu
    Zheng, Qing-Hua
    Chen, Lin
    Jisuanji Xuebao/Chinese Journal of Computers, 2014, 37 (11): : 2235 - 2246
  • [5] Machine learning for large-scale MOF screening
    Coupry, Damien
    Groot, Laurens
    Addicoat, Matthew
    Heine, Thomas
    ABSTRACTS OF PAPERS OF THE AMERICAN CHEMICAL SOCIETY, 2017, 253
  • [6] Large-Scale Machine Learning and Neuroimaging in Psychiatry
    Thompson, Paul
    BIOLOGICAL PSYCHIATRY, 2018, 83 (09) : S51 - S51
  • [7] Coding for Large-Scale Distributed Machine Learning
    Xiao, Ming
    Skoglund, Mikael
    ENTROPY, 2022, 24 (09)
  • [8] Large-scale Machine Learning over Graphs
    Yang, Yiming
    PROCEEDINGS OF THE 2018 ACM SIGIR INTERNATIONAL CONFERENCE ON THEORY OF INFORMATION RETRIEVAL (ICTIR'18), 2018, : 9 - 9
  • [9] Robust Large-Scale Machine Learning in the Cloud
    Rendle, Steffen
    Fetterly, Dennis
    Shekita, Eugene J.
    Su, Bor-yiing
    KDD'16: PROCEEDINGS OF THE 22ND ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2016, : 1125 - 1134
  • [10] Resource Elasticity for Large-Scale Machine Learning
    Huang, Botong
    Boehm, Matthias
    Tian, Yuanyuan
    Reinwald, Berthold
    Tatikonda, Shirish
    Reiss, Frederick R.
    SIGMOD'15: PROCEEDINGS OF THE 2015 ACM SIGMOD INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 2015, : 137 - 152