A Bayesian perspective of statistical machine learning for big data

被引:18
|
作者
Sambasivan, Rajiv [1 ,2 ]
Das, Sourish [1 ,2 ]
Sahu, Sujit K. [1 ,2 ]
机构
[1] Chennai Math Inst, Chennai, Tamil Nadu, India
[2] Univ Southampton, Southampton, Hants, England
关键词
Bayesian methods; Big data; Machine learning; Statistical learning; REGRESSION; OPTIMIZATION; SELECTION; INFERENCE; MODEL;
D O I
10.1007/s00180-020-00970-8
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
Statistical Machine Learning (SML) refers to a body of algorithms and methods by which computers are allowed to discover important features of input data sets which are often very large in size. The very task of feature discovery from data is essentially the meaning of the keyword 'learning' in SML. Theoretical justifications for the effectiveness of the SML algorithms are underpinned by sound principles from different disciplines, such as Computer Science and Statistics. The theoretical underpinnings particularly justified by statistical inference methods are together termed as statistical learning theory. This paper provides a review of SML from a Bayesian decision theoretic point of view-where we argue that many SML techniques are closely connected to making inference by using the so called Bayesian paradigm. We discuss many important SML techniques such as supervised and unsupervised learning, deep learning, online learning and Gaussian processes especially in the context of very large data sets where these are often employed. We present a dictionary which maps the key concepts of SML from Computer Science and Statistics. We illustrate the SML techniques with three moderately large data sets where we also discuss many practical implementation issues. Thus the review is especially targeted at statisticians and computer scientists who are aspiring to understand and apply SML for moderately large to big data sets.
引用
收藏
页码:893 / 930
页数:38
相关论文
共 50 条
  • [31] Machine Translation from Big Data Perspective
    Myrzakhmetov, Bagdat
    Yessenbayev, Zhandos
    Makazhanov, Aibek
    2017 11TH IEEE INTERNATIONAL CONFERENCE ON APPLICATION OF INFORMATION AND COMMUNICATION TECHNOLOGIES (AICT 2017), 2017, : 35 - 39
  • [32] BAYESIAN STATISTICAL PARAMETRIC VERIFICATION AND SYNTHESIS BY MACHINE LEARNING
    Bortolussi, Luca
    Sanguinetti, Guido
    Silvetti, Simone
    2018 WINTER SIMULATION CONFERENCE (WSC), 2018, : 381 - 394
  • [33] Statistical Inference, Learning and Models in Big Data
    Franke, Beate
    Plante, Jean-Francois
    Roscher, Ribana
    Lee, En-Shiun Annie
    Smyth, Cathal
    Hatefi, Armin
    Chen, Fuqi
    Gil, Einat
    Schwing, Alexander
    Selvitella, Alessandro
    Hoffman, Michael M.
    Grosse, Roger
    Hendricks, Dieter
    Reid, Nancy
    INTERNATIONAL STATISTICAL REVIEW, 2016, 84 (03) : 371 - 389
  • [35] Review of Statistical Learning for Big, Dependent Data
    Matthews, Steve
    JOURNAL OF OFFICIAL STATISTICS, 2024, 40 (04) : 849 - 852
  • [36] Cyber risk prediction through social media big data analytics and statistical machine learning
    Subroto, Athor
    Apriyana, Andri
    JOURNAL OF BIG DATA, 2019, 6 (01)
  • [37] Machine Learning Techniques and Statistical Methods for Business Applications: Implications on Big Data Gold Rush
    Chun, Se-Hak
    ADVANCED SCIENCE LETTERS, 2018, 24 (07) : 5474 - 5477
  • [38] Cyber risk prediction through social media big data analytics and statistical machine learning
    Athor Subroto
    Andri Apriyana
    Journal of Big Data, 6
  • [39] Telescopic broad Bayesian learning for big data stream
    Yuen, Ka-Veng
    Kuok, Sin-Chi
    COMPUTER-AIDED CIVIL AND INFRASTRUCTURE ENGINEERING, 2025, 40 (01) : 33 - 53
  • [40] Machine learning on big data for future computing
    Jeong, Young-Sik
    Hassan, Houcine
    Sangaiah, Arun Kumar
    JOURNAL OF SUPERCOMPUTING, 2019, 75 (06): : 2925 - 2929