A Bayesian perspective of statistical machine learning for big data

被引:18
|
作者
Sambasivan, Rajiv [1 ,2 ]
Das, Sourish [1 ,2 ]
Sahu, Sujit K. [1 ,2 ]
机构
[1] Chennai Math Inst, Chennai, Tamil Nadu, India
[2] Univ Southampton, Southampton, Hants, England
关键词
Bayesian methods; Big data; Machine learning; Statistical learning; REGRESSION; OPTIMIZATION; SELECTION; INFERENCE; MODEL;
D O I
10.1007/s00180-020-00970-8
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
Statistical Machine Learning (SML) refers to a body of algorithms and methods by which computers are allowed to discover important features of input data sets which are often very large in size. The very task of feature discovery from data is essentially the meaning of the keyword 'learning' in SML. Theoretical justifications for the effectiveness of the SML algorithms are underpinned by sound principles from different disciplines, such as Computer Science and Statistics. The theoretical underpinnings particularly justified by statistical inference methods are together termed as statistical learning theory. This paper provides a review of SML from a Bayesian decision theoretic point of view-where we argue that many SML techniques are closely connected to making inference by using the so called Bayesian paradigm. We discuss many important SML techniques such as supervised and unsupervised learning, deep learning, online learning and Gaussian processes especially in the context of very large data sets where these are often employed. We present a dictionary which maps the key concepts of SML from Computer Science and Statistics. We illustrate the SML techniques with three moderately large data sets where we also discuss many practical implementation issues. Thus the review is especially targeted at statisticians and computer scientists who are aspiring to understand and apply SML for moderately large to big data sets.
引用
收藏
页码:893 / 930
页数:38
相关论文
共 50 条
  • [21] Perspective: Big Data and Machine Learning Could Help Advance Nutritional Epidemiology
    Morgenstern, Jason D.
    Rosella, Laura C.
    Costa, Andrew P.
    de Souza, Russell J.
    Anderson, Laura N.
    ADVANCES IN NUTRITION, 2021, 12 (03) : 621 - 631
  • [22] Big Data and Machine Learning: A Resident's Perspective of the 2016 Intersociety Conference
    Gimarc, David C.
    Misono, Alexander S.
    JOURNAL OF THE AMERICAN COLLEGE OF RADIOLOGY, 2018, 15 (01) : 114 - 115
  • [23] Machine learning for big data analytics
    Oja, E. (erkki.oja@aalto.fi), 1600, Springer Verlag (384):
  • [24] Big data and machine learning in health
    Carvalho, D.
    Cruz, R.
    EUROPEAN JOURNAL OF PUBLIC HEALTH, 2020, 30 : 10 - 11
  • [25] Machine learning and big scientific data
    Hey, Tony
    Butler, Keith
    Jackson, Sam
    Thiyagalingam, Jeyarajan
    PHILOSOPHICAL TRANSACTIONS OF THE ROYAL SOCIETY A-MATHEMATICAL PHYSICAL AND ENGINEERING SCIENCES, 2020, 378 (2166):
  • [26] Machine Learning under Big Data
    Shi, Chunhe
    Wu, Chengdong
    Han, Xiaowei
    Xie, Yinghong
    Li, Zhen
    PROCEEDINGS OF THE 6TH INTERNATIONAL CONFERENCE ON ELECTRONIC, MECHANICAL, INFORMATION AND MANAGEMENT SOCIETY (EMIM), 2016, 40 : 301 - 305
  • [27] Machine learning, big data, and neuroscience
    Pillow, Jonathan
    Sahani, Maneesh
    CURRENT OPINION IN NEUROBIOLOGY, 2019, 55 : III - IV
  • [28] Statistical inference and machine learning for big data by Mayer Alvo, Springer Cham.
    Chen, Li-Pang
    BIOMETRICS, 2023, 79 (04) : 4013 - 4013
  • [29] Advanced Machine Learning & Statistical Inference Approaches for Big Data Analytics and Information Fusion
    Mehra, Raman K.
    Gandhe, Avinash
    Mansinghka, Vikash
    Shafto, Patrick
    Lovell, Dan
    Yu, Ssu-Hsin
    SIGNAL PROCESSING, SENSOR FUSION, AND TARGET RECOGNITION XXII, 2013, 8745
  • [30] Advanced Machine Learning and Statistical Inference Approaches for Big Data Analytics and Information Fusion
    Mehra, Raman K.
    Gandhe, Avinash
    Mansinghka, Vikash
    Shafto, Patrick
    Lovell, Dan
    Yu, Ssu-Hsin
    SIGNAL PROCESSING, SENSOR FUSION, AND TARGET RECOGNITION XXII, 2013, 8745