Towards algorithmic analytics for large-scale datasets

被引:55
|
作者
Bzdok, Danilo [1 ,2 ,3 ]
Nichols, Thomas E. [4 ,5 ]
Smith, Stephen M. [4 ]
机构
[1] Rhein Westfal TH Aachen, Dept Psychiat Psychotherapy & Psychosomat, Aachen, Germany
[2] JARA, Translat Brain Med, Aachen, Germany
[3] CEA Saclay, Neurospin, INRIA, Parietal Team, Gif Sur Yvette, France
[4] Univ Oxford, Wellcome Trust Ctr Integrat Neuroimaging WIN FMRI, Oxford, England
[5] Univ Oxford, Big Data Inst, Oxford, England
关键词
BAYESIAN-INFERENCE; PERMUTATION TESTS; BRAIN; CONNECTIVITY; MODELS; PARCELLATION; PITFALLS; PRIMER;
D O I
10.1038/s42256-019-0069-5
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Classical statistical analysis in many empirical sciences has lagged behind modern trends in analytics for large-scale datasets. The authors discuss the influence of more variables, larger sample sizes, open data sources for analysis and assessment, and 'black box' prediction methods on the empirical sciences, and provide examples from imaging neuroscience. The traditional goal of quantitative analytics is to find simple, transparent models that generate explainable insights. In recent years, large-scale data acquisition enabled, for instance, by brain scanning and genomic profiling with microarray-type techniques, has prompted a wave of statistical inventions and innovative applications. Here we review some of the main trends in learning from 'big data' and provide examples from imaging neuroscience. Some main messages we find are that modern analysis approaches (1) tame complex data with parameter regularization and dimensionality-reduction strategies, (2) are increasingly backed up by empirical model validations rather than justified by mathematical proofs, (3) will compare against and build on open data and consortium repositories, as well as (4) often embrace more elaborate, less interpretable models to maximize prediction accuracy.
引用
收藏
页码:296 / 306
页数:11
相关论文
共 50 条
  • [41] Understanding Data Similarity in Large-Scale Scientific Datasets
    Linton, Payton
    Melodia, William
    Lazar, Alina
    Agarwal, Deborah
    Bianchi, Ludovico
    Ghoshal, Devarshi
    Pastorello, Gilbert
    Ramakrishnan, Lavanya
    Wu, Kesheng
    [J]. 2019 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2019, : 4525 - 4531
  • [42] LARGE-SCALE DATASETS FOR GOING DEEPER IN IMAGE UNDERSTANDING
    Wu, Jiahong
    Zheng, He
    Zhao, Bo
    Li, Yixin
    Yan, Baoming
    Liang, Rui
    Wang, Wenjia
    Zhou, Shipei
    Lin, Guosen
    Fu, Yanwei
    Wang, Yizhou
    Wang, Yonggang
    [J]. 2019 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME), 2019, : 1480 - 1485
  • [43] A fast fuzzy clustering algorithm for large-scale datasets
    Shi, LK
    He, PL
    [J]. ADVANCED DATA MINING AND APPLICATIONS, PROCEEDINGS, 2005, 3584 : 203 - 208
  • [44] MFC Datasets: Large-Scale Benchmark Datasets for Media Forensic Challenge Evaluation
    Guan, Haiying
    Kozak, Mark
    Robertson, Eric
    Lee, Yooyoung
    Yates, Amy N.
    Delgado, Andrew
    Zhou, Daniel
    Kheyrkhah, Timothee
    Smith, Jeff
    Fiscus, Jonathan
    [J]. 2019 IEEE WINTER APPLICATIONS OF COMPUTER VISION WORKSHOPS (WACVW), 2019, : 63 - 72
  • [45] Visual Analytics of Large-Scale Climate Model Data
    Wong, Pak Chung
    Shen, Han-Wei
    Leung, Ruby
    Hagos, Samson
    Lee, Teng-Yok
    Tong, Xin
    Lu, Kewei
    [J]. 2014 IEEE 4TH SYMPOSIUM ON LARGE DATA ANALYSIS AND VISUALIZATION (LDAV), 2014, : 85 - 92
  • [46] Disco: A Computing Platform for Large-Scale Data Analytics
    Mundkur, Prashanth
    Tuulos, Ville
    Flatow, Jared
    [J]. ERLANG 11: PROCEEDINGS OF THE 2011 ACM SIGPLAN ERLANG WORKSHOP, 2011, : 84 - 89
  • [47] Assessing large-scale digitization using Web analytics
    Lapworth, Emily
    [J]. DIGITAL LIBRARY PERSPECTIVES, 2021, 37 (02) : 133 - 150
  • [48] Visual Analytics for Situation Awareness of a Large-Scale Network
    Horn, Chris
    Ellsworth, Chris
    [J]. 2012 IEEE CONFERENCE ON VISUAL ANALYTICS SCIENCE AND TECHNOLOGY (VAST), 2012, : 263 - 264
  • [49] Exploring Network Optimizations for Large-Scale Graph Analytics
    Que, Xinyu
    Checconi, Fabio
    Petrini, Fabrizio
    Liu, Xing
    Buono, Daniele
    [J]. PROCEEDINGS OF SC15: THE INTERNATIONAL CONFERENCE FOR HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS, 2015,
  • [50] KeystoneML: Optimizing Pipelines for Large-Scale Advanced Analytics
    Sparks, Evan R.
    Venkataraman, Shivaram
    Kaftan, Tomer
    Franklin, Michael J.
    Recht, Benjamin
    [J]. 2017 IEEE 33RD INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE 2017), 2017, : 535 - 546