A Bayesian perspective of statistical machine learning for big data

被引:18
|
作者
Sambasivan, Rajiv [1 ,2 ]
Das, Sourish [1 ,2 ]
Sahu, Sujit K. [1 ,2 ]
机构
[1] Chennai Math Inst, Chennai, Tamil Nadu, India
[2] Univ Southampton, Southampton, Hants, England
关键词
Bayesian methods; Big data; Machine learning; Statistical learning; REGRESSION; OPTIMIZATION; SELECTION; INFERENCE; MODEL;
D O I
10.1007/s00180-020-00970-8
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
Statistical Machine Learning (SML) refers to a body of algorithms and methods by which computers are allowed to discover important features of input data sets which are often very large in size. The very task of feature discovery from data is essentially the meaning of the keyword 'learning' in SML. Theoretical justifications for the effectiveness of the SML algorithms are underpinned by sound principles from different disciplines, such as Computer Science and Statistics. The theoretical underpinnings particularly justified by statistical inference methods are together termed as statistical learning theory. This paper provides a review of SML from a Bayesian decision theoretic point of view-where we argue that many SML techniques are closely connected to making inference by using the so called Bayesian paradigm. We discuss many important SML techniques such as supervised and unsupervised learning, deep learning, online learning and Gaussian processes especially in the context of very large data sets where these are often employed. We present a dictionary which maps the key concepts of SML from Computer Science and Statistics. We illustrate the SML techniques with three moderately large data sets where we also discuss many practical implementation issues. Thus the review is especially targeted at statisticians and computer scientists who are aspiring to understand and apply SML for moderately large to big data sets.
引用
收藏
页码:893 / 930
页数:38
相关论文
共 50 条
  • [41] Machine learning on big data for future computing
    Young-Sik Jeong
    Houcine Hassan
    Arun Kumar Sangaiah
    The Journal of Supercomputing, 2019, 75 : 2925 - 2929
  • [42] Machine Learning With Big Data: Challenges and Approaches
    L'Heureux, Alexandra
    Grolinger, Katarina
    Elyamany, Hany F.
    Capretz, Miriam A. M.
    IEEE ACCESS, 2017, 5 : 7776 - 7797
  • [43] Machine Learning Meets Big Spatial Data
    Sabek, Ibrahim
    Mokbel, Mohamed F.
    PROCEEDINGS OF THE VLDB ENDOWMENT, 2019, 12 (12): : 1982 - 1985
  • [44] Machine Learning Challenges in Big Data Era
    Veganzones-Bodon, Miguel
    DYNA, 2019, 94 (05): : 478 - 479
  • [45] Machine Learning for Astronomical Big Data Processing
    Xu, Long
    Yan, Yihua
    2017 IEEE VISUAL COMMUNICATIONS AND IMAGE PROCESSING (VCIP), 2017,
  • [46] Machine learning in 'big data': handle with care
    Loring, Zak
    Mehrotra, Suchit
    Piccini, Jonathan P.
    EUROPACE, 2019, 21 (09): : 1284 - 1285
  • [47] Machine Learning and Computational Intelligence in Big Data
    Anagnostopoulos, Christos
    Kolomvatsos, Kostas
    INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2015, 6 (06) : 873 - 874
  • [48] Machine learning on big data: Opportunities and challenges
    Zhou, Lina
    Pan, Shimei
    Wang, Jianwu
    Vasilakos, Athanasios V.
    NEUROCOMPUTING, 2017, 237 : 350 - 361
  • [49] Big Data and Machine Learning Framework in Healthcare
    Dogaru, Delia Ioana
    Dumitrache, Ioan
    2019 E-HEALTH AND BIOENGINEERING CONFERENCE (EHB), 2019,
  • [50] Green Computing for Big Data and Machine Learning
    Barua, Hrishav Bakul
    Mondal, Kartick Chandra
    Khatua, Sunirmal
    PROCEEDINGS OF THE 5TH JOINT INTERNATIONAL CONFERENCE ON DATA SCIENCE & MANAGEMENT OF DATA, CODS COMAD 2022, 2022, : 348 - 351