A Survey on Large-Scale Machine Learning

被引:50
|
作者
Wang, Meng [1 ,2 ]
Fu, Weijie [1 ,2 ]
He, Xiangnan [3 ]
Hao, Shijie [1 ,2 ]
Wu, Xindong [1 ,2 ]
机构
[1] Hefei Univ Technol, Key Lab Knowledge Engn Big Data, Minist Educ, Hefei 230601, Anhui, Peoples R China
[2] Hefei Univ Technol, Sch Comp Sci & Informat Engn, Hefei 230601, Anhui, Peoples R China
[3] Univ Sci & Technol China, Hefei 230031, Anhui, Peoples R China
关键词
Machine learning; Computational modeling; Optimization; Predictive models; Big Data; Computational complexity; Large-scale machine learning; efficient machine learning; big data analysis; efficiency; survey; GRAPH CONSTRUCTION; BIG DATA; OPTIMIZATION; ALGORITHMS;
D O I
10.1109/TKDE.2020.3015777
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Machine learning can provide deep insights into data, allowing machines to make high-quality predictions and having been widely used in real-world applications, such as text mining, visual classification, and recommender systems. However, most sophisticated machine learning approaches suffer from huge time costs when operating on large-scale data. This issue calls for the need of Large-scale Machine Learning (LML), which aims to learn patterns from big data with comparable performance efficiently. In this paper, we offer a systematic survey on existing LML methods to provide a blueprint for the future developments of this area. We first divide these LML methods according to the ways of improving the scalability: 1) model simplification on computational complexities, 2) optimization approximation on computational efficiency, and 3) computation parallelism on computational capabilities. Then we categorize the methods in each perspective according to their targeted scenarios and introduce representative methods in line with intrinsic strategies. Lastly, we analyze their limitations and discuss potential directions as well as open issues that are promising to address in the future.
引用
收藏
页码:2574 / 2594
页数:21
相关论文
共 50 条
  • [41] Large-Scale Machine Learning and Optimization for Bioinformatics Data Analysis
    Cheng, Jianlin
    ACM-BCB 2020 - 11TH ACM CONFERENCE ON BIOINFORMATICS, COMPUTATIONAL BIOLOGY, AND HEALTH INFORMATICS, 2020,
  • [42] Large-Scale Machine Learning Algorithms for Biomedical Data Science
    Huang, Heng
    ACM-BCB'19: PROCEEDINGS OF THE 10TH ACM INTERNATIONAL CONFERENCE ON BIOINFORMATICS, COMPUTATIONAL BIOLOGY AND HEALTH INFORMATICS, 2019, : 4 - 4
  • [43] Hybrid Parallelization Strategies for Large-Scale Machine Learning in SystemML
    Boehm, Matthias
    Tatikonda, Shirish
    Reinwald, Berthold
    Sen, Prithviraj
    Tian, Yuanyuan
    Burdick, Douglas R.
    Vaithyanathan, Shivakumar
    PROCEEDINGS OF THE VLDB ENDOWMENT, 2014, 7 (07): : 553 - 564
  • [44] ClimateSet: A Large-Scale Climate Model Dataset for Machine Learning
    Kaltenborn, Julia
    Lange, Charlotte Emilie Elektra
    Ramesh, Venkatesh
    Brouillard, Philippe
    Gurwicz, Yaniv
    Nagda, Chandni
    Runge, Jakob
    Nowack, Peer
    Rolnick, David
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [45] Technical Perspective Compressing Matrices for Large-Scale Machine Learning
    Ives, Zachary G.
    COMMUNICATIONS OF THE ACM, 2019, 62 (05) : 82 - 82
  • [46] A Universal Machine Learning Algorithm for Large-Scale Screening of Materials
    Fanourgakis, George S.
    Gkagkas, Konstantinos
    Tylianakis, Emmanuel
    Froudakis, George E.
    JOURNAL OF THE AMERICAN CHEMICAL SOCIETY, 2020, 142 (08) : 3814 - 3822
  • [47] Digital Optical Neural Networks for Large-Scale Machine Learning
    Bernstein, Liane
    Sludds, Alexander
    Hamerly, Ryan
    Sze, Vivienne
    Emer, Joel
    Englund, Dirk
    2020 CONFERENCE ON LASERS AND ELECTRO-OPTICS (CLEO), 2020,
  • [48] Compressed Linear Algebra for Declarative Large-Scale Machine Learning
    Elgohary, Ahmed
    Boehm, Matthias
    Haas, Peter J.
    Reiss, Frederick R.
    Reinwald, Berthold
    COMMUNICATIONS OF THE ACM, 2019, 62 (05) : 83 - 91
  • [49] Human-Machine Cooperation in Large-Scale Multimedia Retrieval: A Survey
    Shirahama, Kimiaki
    Grzegorzek, Marcin
    Indurkhya, Bipin
    JOURNAL OF PROBLEM SOLVING, 2015, 8 (01): : 36 - 63
  • [50] A framework for generating large-scale microphone array data for machine learning
    Kujawski, Adam
    Pelling, Art J. R.
    Jekosch, Simon
    Sarradj, Ennes
    MULTIMEDIA TOOLS AND APPLICATIONS, 2024, 83 (11) : 31211 - 31231