Towards algorithmic analytics for large-scale datasets

被引:55
|
作者
Bzdok, Danilo [1 ,2 ,3 ]
Nichols, Thomas E. [4 ,5 ]
Smith, Stephen M. [4 ]
机构
[1] Rhein Westfal TH Aachen, Dept Psychiat Psychotherapy & Psychosomat, Aachen, Germany
[2] JARA, Translat Brain Med, Aachen, Germany
[3] CEA Saclay, Neurospin, INRIA, Parietal Team, Gif Sur Yvette, France
[4] Univ Oxford, Wellcome Trust Ctr Integrat Neuroimaging WIN FMRI, Oxford, England
[5] Univ Oxford, Big Data Inst, Oxford, England
关键词
BAYESIAN-INFERENCE; PERMUTATION TESTS; BRAIN; CONNECTIVITY; MODELS; PARCELLATION; PITFALLS; PRIMER;
D O I
10.1038/s42256-019-0069-5
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Classical statistical analysis in many empirical sciences has lagged behind modern trends in analytics for large-scale datasets. The authors discuss the influence of more variables, larger sample sizes, open data sources for analysis and assessment, and 'black box' prediction methods on the empirical sciences, and provide examples from imaging neuroscience. The traditional goal of quantitative analytics is to find simple, transparent models that generate explainable insights. In recent years, large-scale data acquisition enabled, for instance, by brain scanning and genomic profiling with microarray-type techniques, has prompted a wave of statistical inventions and innovative applications. Here we review some of the main trends in learning from 'big data' and provide examples from imaging neuroscience. Some main messages we find are that modern analysis approaches (1) tame complex data with parameter regularization and dimensionality-reduction strategies, (2) are increasingly backed up by empirical model validations rather than justified by mathematical proofs, (3) will compare against and build on open data and consortium repositories, as well as (4) often embrace more elaborate, less interpretable models to maximize prediction accuracy.
引用
收藏
页码:296 / 306
页数:11
相关论文
共 50 条
  • [21] Iterative Classification for Sanitizing Large-Scale Datasets
    Li, Bo
    Vorobeychik, Yevgeniy
    Li, Muqun
    Malin, Bradley
    [J]. 2015 IEEE INTERNATIONAL CONFERENCE ON DATA MINING (ICDM), 2015, : 841 - 846
  • [22] Towards Big Data Analytics in Large-Scale Federations of Semantically Heterogeneous IoT Platforms
    Kalamaras, Ilias
    Kaklanis, Nikolaos
    Votis, Kostantinos
    Tzovaras, Dimitrios
    [J]. ARTIFICIAL INTELLIGENCE APPLICATIONS AND INNOVATIONS, AIAI 2018, 2018, 520 : 13 - 23
  • [23] Large-Scale Causality Discovery Analytics as a Service
    Wang, Xin
    Guo, Pei
    Wang, Jianwu
    [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2021, : 3130 - 3140
  • [24] URSPRUNG: Provenance for Large-Scale Analytics Environments
    Rupprecht, Lukas
    Davis, James C.
    Arnold, Constantine
    Lubbock, Alexander
    Tyson, Darren
    Bhagwat, Deepavali
    [J]. SIGMOD '19: PROCEEDINGS OF THE 2019 INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 2019, : 1989 - 1992
  • [25] A Hybrid Data Model for Large-Scale Analytics
    Feo, John
    [J]. 2018 ACM INTERNATIONAL CONFERENCE ON COMPUTING FRONTIERS, 2018, : 269 - 269
  • [26] Cultural Analytics in Large-Scale Visualization Environments
    Yamaoka, So
    Manovich, Lev
    Douglass, Jeremy
    Kuester, Falko
    [J]. COMPUTER, 2011, 44 (12) : 39 - 48
  • [27] Rapid, Progressive Sub-Graph Explorations for Interactive Visual Analytics over Large-Scale Graph Datasets
    Armstrong, Samuel
    Bruhwiler, Kevin
    Pallickara, Sangmi Lee
    [J]. BDCAT'19: PROCEEDINGS OF THE 6TH IEEE/ACM INTERNATIONAL CONFERENCE ON BIG DATA COMPUTING, APPLICATIONS AND TECHNOLOGIES, 2019, : 1 - 10
  • [28] Comprehensive comparison of large-scale tissue expression datasets
    Santos, Alberto
    Tsafou, Kalliopi
    Stolte, Christian
    Pletscher-Frankild, Sune
    O'Donoghue, Sean I.
    Jensen, Lars Juhl
    [J]. PEERJ, 2015, 3
  • [29] Large-scale Localization Datasets in Crowded Indoor Spaces
    Lee, Donghwan
    Ryu, Soohyun
    Yeon, Suyong
    Lee, Yonghan
    Kim, Deokhwa
    Han, Cheolho
    Cabon, Yohann
    Weinzaepfel, Philippe
    Guerin, Nicolas
    Csurka, Gabriela
    Humenberger, Martin
    [J]. 2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 3226 - 3235
  • [30] GUILD - A Generator for Usable Images in Large-Scale Datasets
    Roch, Peter
    Nejad, Bijan Shahbaz
    Handte, Marcus
    Marron, Pedro Jose
    [J]. ADVANCES IN VISUAL COMPUTING, ISVC 2022, PT II, 2022, 13599 : 245 - 258