Towards algorithmic analytics for large-scale datasets

被引：55

作者：

Bzdok, Danilo ^{[1
,2
,3
]}

Nichols, Thomas E. ^{[4
,5
]}

Smith, Stephen M. ^{[4
]}

机构：

[1] Rhein Westfal TH Aachen, Dept Psychiat Psychotherapy & Psychosomat, Aachen, Germany

[2] JARA, Translat Brain Med, Aachen, Germany

[3] CEA Saclay, Neurospin, INRIA, Parietal Team, Gif Sur Yvette, France

[4] Univ Oxford, Wellcome Trust Ctr Integrat Neuroimaging WIN FMRI, Oxford, England

[5] Univ Oxford, Big Data Inst, Oxford, England

来源：

NATURE MACHINE INTELLIGENCE | 2019年 / 1卷 / 07期

关键词：

BAYESIAN-INFERENCE; PERMUTATION TESTS; BRAIN; CONNECTIVITY; MODELS; PARCELLATION; PITFALLS; PRIMER;

D O I：

10.1038/s42256-019-0069-5

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Classical statistical analysis in many empirical sciences has lagged behind modern trends in analytics for large-scale datasets. The authors discuss the influence of more variables, larger sample sizes, open data sources for analysis and assessment, and 'black box' prediction methods on the empirical sciences, and provide examples from imaging neuroscience. The traditional goal of quantitative analytics is to find simple, transparent models that generate explainable insights. In recent years, large-scale data acquisition enabled, for instance, by brain scanning and genomic profiling with microarray-type techniques, has prompted a wave of statistical inventions and innovative applications. Here we review some of the main trends in learning from 'big data' and provide examples from imaging neuroscience. Some main messages we find are that modern analysis approaches (1) tame complex data with parameter regularization and dimensionality-reduction strategies, (2) are increasingly backed up by empirical model validations rather than justified by mathematical proofs, (3) will compare against and build on open data and consortium repositories, as well as (4) often embrace more elaborate, less interpretable models to maximize prediction accuracy.

引用

页码：296 / 306

页数：11

共 50 条

[1] Towards algorithmic analytics for large-scale datasets
Danilo Bzdok
Thomas E. Nichols
Stephen M. Smith
[J]. Nature Machine Intelligence, 2019, 1 : 296 - 306
[2] Towards Matching User Mobility Traces in Large-Scale Datasets
Kondor, Daniel
Hashemian, Behrooz
de Montjoye, Yves-Alexandre
Ratti, Carlo
[J]. IEEE TRANSACTIONS ON BIG DATA, 2020, 6 (04) : 714 - 726
[3] Towards efficient data search and subsetting of large-scale atmospheric datasets
Pallickara, Sangmi Lee
Pallickara, Shrideep
Zupanski, Milija
[J]. FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF GRID COMPUTING AND ESCIENCE, 2012, 28 (01): : 112 - 118
[4] Big Data Analytics on Large-Scale Scientific Datasets in the INDIGO-DataCloud Project
Fiore, Sandro
Palazzo, Cosimo
D'Anca, Alessandro
Elia, Donatello
Londero, Elisa
Knapic, Cristina
Monna, Stephen
Marcucci, Nicola M.
Aguilar, Fernando
Plociennik, Marcin
De Lucas, Jesus E. Marco
Aloisio, Giovanni
[J]. ACM INTERNATIONAL CONFERENCE ON COMPUTING FRONTIERS 2017, 2017, : 343 - 348
[5] Large-scale Complex Analytics on Semi-structured Datasets using AsterixDB and Spark
Alkowaileet, Wail Y.
Alsubaiee, Sattam
Carey, Michael J.
Westmann, Till
Bu, Yingyi
[J]. PROCEEDINGS OF THE VLDB ENDOWMENT, 2016, 9 (13): : 1585 - 1588
[6] GeoLens: Enabling Interactive Visual Analytics over Large-scale, Multidimensional Geospatial Datasets
Koontz, Jared
Malensek, Matthew
Pallickara, Sangmi Lee
[J]. 2014 IEEE/ACM INTERNATIONAL SYMPOSIUM ON BIG DATA COMPUTING (BDC), 2014, : 35 - 44
[7] Learning to Index in Large-Scale Datasets
Prayoonwong, Amorntip
Wang, Cheng-Hsien
Chiu, Chih-Yi
[J]. MULTIMEDIA MODELING, MMM 2018, PT I, 2018, 10704 : 305 - 316
[8] Visualization of large-scale trajectory datasets
Zachar, Gergely
[J]. 2023 CYBER-PHYSICAL SYSTEMS AND INTERNET-OF-THINGS WEEK, CPS-IOT WEEK WORKSHOPS, 2023, : 152 - 157
[9] Towards Good Practices for Efficiently Annotating Large-Scale Image Classification Datasets
Liao, Yuan-Hong
Kar, Amlan
Fidler, Sanja
[J]. 2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 4348 - 4357
[10] Large-Scale Graph Visualization and Analytics
Ma, Kwan-Liu
Muelder, Chris W.
[J]. COMPUTER, 2013, 46 (07) : 39 - 46

← 1 2 3 4 5 →