Machine learning and big scientific data

被引:46
|
作者
Hey, Tony [1 ]
Butler, Keith [1 ]
Jackson, Sam [1 ]
Thiyagalingam, Jeyarajan [1 ]
机构
[1] Rutherford Appleton Lab, Sci & Technol Facil Council, Sci Comp Dept, Didcot OX11 0QX, Oxon, England
基金
英国工程与自然科学研究理事会;
关键词
machine learning; materials science; atmospheric science; electron microscopy; image processing; AI benchmarks; CONVOLUTIONAL NEURAL-NETWORK; BAYESIAN CLOUD DETECTION; 2-DIMENSIONAL ANTIFERROMAGNETS; IMAGERY; VALIDATION; DESIGN;
D O I
10.1098/rsta.2019.0054
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
This paper reviews some of the challenges posed by the huge growth of experimental data generated by the new generation of large-scale experiments at UK national facilities at the Rutherford Appleton Laboratory (RAL) site at Harwell near Oxford. Such 'Big Scientific Data' comes from the Diamond Light Source and Electron Microscopy Facilities, the ISIS Neutron and Muon Facility and the UK's Central Laser Facility. Increasingly, scientists are now required to use advanced machine learning and other AI technologies both to automate parts of the data pipeline and to help find new scientific discoveries in the analysis of their data. For commercially important applications, such as object recognition, natural language processing and automatic translation, deep learning has made dramatic breakthroughs. Google's DeepMind has now used the deep learning technology to develop their AlphaFold tool to make predictions for protein folding. Remarkably, it has been able to achieve some spectacular results for this specific scientific problem. Can deep learning be similarly transformative for other scientific problems? After a brief review of some initial applications of machine learning at the RAL, we focus on challenges and opportunities for AI in advancing materials science. Finally, we discuss the importance of developing some realistic machine learning benchmarks using Big Scientific Data coming from several different scientific domains. We conclude with some initial examples of our 'scientific machine learning' benchmark suite and of the research challenges these benchmarks will enable. This article is part of a discussion meeting issue 'Numerical algorithms for high-performance computational science'.
引用
收藏
页数:23
相关论文
共 50 条
  • [1] Machine Learning and Social Media to Mine and Disseminate Big Scientific Data
    Devarakonda, Ranjeet
    Giansiracusa, Michael
    Kumar, Jitendra
    [J]. 2018 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2018, : 5312 - 5315
  • [2] Special Issue: Big Scientific Data and Machine Learning in Science and Engineering
    Pourkamali-Anaraki, Farhad
    [J]. BIG DATA, 2024, 12 (04) : 269 - 269
  • [3] Machine Learning in Big Data
    Wang, Lidong
    Alexander, Cheryl Ann
    [J]. INTERNATIONAL JOURNAL OF MATHEMATICAL ENGINEERING AND MANAGEMENT SCIENCES, 2016, 1 (02) : 52 - 61
  • [4] Machine Learning on Big Data
    Condie, Tyson
    Mineiro, Paul
    Polyzotis, Neoklis
    Weimer, Markus
    [J]. 2013 IEEE 29TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE), 2013, : 1242 - 1244
  • [5] Harnessing the power of big data: infusing the scientific method with machine learning to transform ecology
    Peters, Debra P. C.
    Havstad, Kris M.
    Cushing, Judy
    Tweedie, Craig
    Fuentes, Olac
    Villanueva-Rosales, Natalia
    [J]. ECOSPHERE, 2014, 5 (06):
  • [6] Agroecosystem research with big data and a modified scientific method using machine learning concepts
    Moran, M. Susan
    Heilman, Philip
    Peters, Debra P. C.
    Collins, Chandra Holifield
    [J]. ECOSPHERE, 2016, 7 (10):
  • [7] Big data and machine learning in health
    Carvalho, D.
    Cruz, R.
    [J]. EUROPEAN JOURNAL OF PUBLIC HEALTH, 2020, 30 : 10 - 11
  • [8] Machine Learning under Big Data
    Shi, Chunhe
    Wu, Chengdong
    Han, Xiaowei
    Xie, Yinghong
    Li, Zhen
    [J]. PROCEEDINGS OF THE 6TH INTERNATIONAL CONFERENCE ON ELECTRONIC, MECHANICAL, INFORMATION AND MANAGEMENT SOCIETY (EMIM), 2016, 40 : 301 - 305
  • [9] Machine learning, big data, and neuroscience
    Pillow, Jonathan
    Sahani, Maneesh
    [J]. CURRENT OPINION IN NEUROBIOLOGY, 2019, 55 : III - IV
  • [10] Machine learning on big data for future computing
    Jeong, Young-Sik
    Hassan, Houcine
    Sangaiah, Arun Kumar
    [J]. JOURNAL OF SUPERCOMPUTING, 2019, 75 (06): : 2925 - 2929