Machine learning and big scientific data

被引:46
|
作者
Hey, Tony [1 ]
Butler, Keith [1 ]
Jackson, Sam [1 ]
Thiyagalingam, Jeyarajan [1 ]
机构
[1] Rutherford Appleton Lab, Sci & Technol Facil Council, Sci Comp Dept, Didcot OX11 0QX, Oxon, England
基金
英国工程与自然科学研究理事会;
关键词
machine learning; materials science; atmospheric science; electron microscopy; image processing; AI benchmarks; CONVOLUTIONAL NEURAL-NETWORK; BAYESIAN CLOUD DETECTION; 2-DIMENSIONAL ANTIFERROMAGNETS; IMAGERY; VALIDATION; DESIGN;
D O I
10.1098/rsta.2019.0054
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
This paper reviews some of the challenges posed by the huge growth of experimental data generated by the new generation of large-scale experiments at UK national facilities at the Rutherford Appleton Laboratory (RAL) site at Harwell near Oxford. Such 'Big Scientific Data' comes from the Diamond Light Source and Electron Microscopy Facilities, the ISIS Neutron and Muon Facility and the UK's Central Laser Facility. Increasingly, scientists are now required to use advanced machine learning and other AI technologies both to automate parts of the data pipeline and to help find new scientific discoveries in the analysis of their data. For commercially important applications, such as object recognition, natural language processing and automatic translation, deep learning has made dramatic breakthroughs. Google's DeepMind has now used the deep learning technology to develop their AlphaFold tool to make predictions for protein folding. Remarkably, it has been able to achieve some spectacular results for this specific scientific problem. Can deep learning be similarly transformative for other scientific problems? After a brief review of some initial applications of machine learning at the RAL, we focus on challenges and opportunities for AI in advancing materials science. Finally, we discuss the importance of developing some realistic machine learning benchmarks using Big Scientific Data coming from several different scientific domains. We conclude with some initial examples of our 'scientific machine learning' benchmark suite and of the research challenges these benchmarks will enable. This article is part of a discussion meeting issue 'Numerical algorithms for high-performance computational science'.
引用
收藏
页数:23
相关论文
共 50 条
  • [21] PivotalR: A Package for Machine Learning on Big Data
    Qian, Hai
    [J]. R JOURNAL, 2014, 6 (01): : 57 - 67
  • [22] Green Computing for Big Data and Machine Learning
    Barua, Hrishav Bakul
    Mondal, Kartick Chandra
    Khatua, Sunirmal
    [J]. PROCEEDINGS OF THE 5TH JOINT INTERNATIONAL CONFERENCE ON DATA SCIENCE & MANAGEMENT OF DATA, CODS COMAD 2022, 2022, : 348 - 351
  • [23] Machine learning for Big Data analytics in plants
    Ma, Chuang
    Zhang, Hao Helen
    Wang, Xiangfeng
    [J]. TRENDS IN PLANT SCIENCE, 2014, 19 (12) : 798 - 808
  • [24] Automated Trading with Machine Learning on Big Data
    Ruta, Dymitr
    [J]. 2014 IEEE INTERNATIONAL CONGRESS ON BIG DATA (BIGDATA CONGRESS), 2014, : 824 - 830
  • [25] Efficient Machine Learning for Big Data: A Review
    Al-Jarrah, Omar Y.
    Yoo, Paul D.
    Muhaidat, Sami
    Karagiannidis, George K.
    Taha, Kamal
    [J]. BIG DATA RESEARCH, 2015, 2 (03) : 87 - 93
  • [26] Big Data, Predictive Analytics and Machine Learning
    Ongsulee, Pariwat
    Chotchaung, Veena
    Bamrungsi, Eak
    Rodcheewit, Thanaporn
    [J]. 2018 16TH INTERNATIONAL CONFERENCE ON ICT AND KNOWLEDGE ENGINEERING (ICT&KE), 2018, : 37 - 42
  • [27] Machine Learning Research in Big Data Environment
    Jiang, Shi
    [J]. 2018 5TH INTERNATIONAL CONFERENCE ON ELECTRICAL & ELECTRONICS ENGINEERING AND COMPUTER SCIENCE (ICEEECS 2018), 2018, : 227 - 231
  • [28] A survey of machine learning for big data processing
    Qiu, Junfei
    Wu, Qihui
    Ding, Guoru
    Xu, Yuhua
    Feng, Shuo
    [J]. EURASIP JOURNAL ON ADVANCES IN SIGNAL PROCESSING, 2016,
  • [29] A survey of machine learning for big data processing
    Junfei Qiu
    Qihui Wu
    Guoru Ding
    Yuhua Xu
    Shuo Feng
    [J]. EURASIP Journal on Advances in Signal Processing, 2016
  • [30] Big data and machine learning for materials science
    Jose F. Rodrigues
    Larisa Florea
    Maria C. F. de Oliveira
    Dermot Diamond
    Osvaldo N. Oliveira
    [J]. Discover Materials, 1 (1):