Towards algorithmic analytics for large-scale datasets

被引:55
|
作者
Bzdok, Danilo [1 ,2 ,3 ]
Nichols, Thomas E. [4 ,5 ]
Smith, Stephen M. [4 ]
机构
[1] Rhein Westfal TH Aachen, Dept Psychiat Psychotherapy & Psychosomat, Aachen, Germany
[2] JARA, Translat Brain Med, Aachen, Germany
[3] CEA Saclay, Neurospin, INRIA, Parietal Team, Gif Sur Yvette, France
[4] Univ Oxford, Wellcome Trust Ctr Integrat Neuroimaging WIN FMRI, Oxford, England
[5] Univ Oxford, Big Data Inst, Oxford, England
关键词
BAYESIAN-INFERENCE; PERMUTATION TESTS; BRAIN; CONNECTIVITY; MODELS; PARCELLATION; PITFALLS; PRIMER;
D O I
10.1038/s42256-019-0069-5
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Classical statistical analysis in many empirical sciences has lagged behind modern trends in analytics for large-scale datasets. The authors discuss the influence of more variables, larger sample sizes, open data sources for analysis and assessment, and 'black box' prediction methods on the empirical sciences, and provide examples from imaging neuroscience. The traditional goal of quantitative analytics is to find simple, transparent models that generate explainable insights. In recent years, large-scale data acquisition enabled, for instance, by brain scanning and genomic profiling with microarray-type techniques, has prompted a wave of statistical inventions and innovative applications. Here we review some of the main trends in learning from 'big data' and provide examples from imaging neuroscience. Some main messages we find are that modern analysis approaches (1) tame complex data with parameter regularization and dimensionality-reduction strategies, (2) are increasingly backed up by empirical model validations rather than justified by mathematical proofs, (3) will compare against and build on open data and consortium repositories, as well as (4) often embrace more elaborate, less interpretable models to maximize prediction accuracy.
引用
收藏
页码:296 / 306
页数:11
相关论文
共 50 条
  • [1] Towards algorithmic analytics for large-scale datasets
    Danilo Bzdok
    Thomas E. Nichols
    Stephen M. Smith
    [J]. Nature Machine Intelligence, 2019, 1 : 296 - 306
  • [2] Towards Matching User Mobility Traces in Large-Scale Datasets
    Kondor, Daniel
    Hashemian, Behrooz
    de Montjoye, Yves-Alexandre
    Ratti, Carlo
    [J]. IEEE TRANSACTIONS ON BIG DATA, 2020, 6 (04) : 714 - 726
  • [3] Towards efficient data search and subsetting of large-scale atmospheric datasets
    Pallickara, Sangmi Lee
    Pallickara, Shrideep
    Zupanski, Milija
    [J]. FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF GRID COMPUTING AND ESCIENCE, 2012, 28 (01): : 112 - 118
  • [4] Big Data Analytics on Large-Scale Scientific Datasets in the INDIGO-DataCloud Project
    Fiore, Sandro
    Palazzo, Cosimo
    D'Anca, Alessandro
    Elia, Donatello
    Londero, Elisa
    Knapic, Cristina
    Monna, Stephen
    Marcucci, Nicola M.
    Aguilar, Fernando
    Plociennik, Marcin
    De Lucas, Jesus E. Marco
    Aloisio, Giovanni
    [J]. ACM INTERNATIONAL CONFERENCE ON COMPUTING FRONTIERS 2017, 2017, : 343 - 348
  • [5] Large-scale Complex Analytics on Semi-structured Datasets using AsterixDB and Spark
    Alkowaileet, Wail Y.
    Alsubaiee, Sattam
    Carey, Michael J.
    Westmann, Till
    Bu, Yingyi
    [J]. PROCEEDINGS OF THE VLDB ENDOWMENT, 2016, 9 (13): : 1585 - 1588
  • [6] GeoLens: Enabling Interactive Visual Analytics over Large-scale, Multidimensional Geospatial Datasets
    Koontz, Jared
    Malensek, Matthew
    Pallickara, Sangmi Lee
    [J]. 2014 IEEE/ACM INTERNATIONAL SYMPOSIUM ON BIG DATA COMPUTING (BDC), 2014, : 35 - 44
  • [7] Learning to Index in Large-Scale Datasets
    Prayoonwong, Amorntip
    Wang, Cheng-Hsien
    Chiu, Chih-Yi
    [J]. MULTIMEDIA MODELING, MMM 2018, PT I, 2018, 10704 : 305 - 316
  • [8] Visualization of large-scale trajectory datasets
    Zachar, Gergely
    [J]. 2023 CYBER-PHYSICAL SYSTEMS AND INTERNET-OF-THINGS WEEK, CPS-IOT WEEK WORKSHOPS, 2023, : 152 - 157
  • [9] Towards Good Practices for Efficiently Annotating Large-Scale Image Classification Datasets
    Liao, Yuan-Hong
    Kar, Amlan
    Fidler, Sanja
    [J]. 2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 4348 - 4357
  • [10] Large-Scale Graph Visualization and Analytics
    Ma, Kwan-Liu
    Muelder, Chris W.
    [J]. COMPUTER, 2013, 46 (07) : 39 - 46