Integrating 'Big' Geoscience Data into the Petascale National Environmental Research Interoperability Platform (NERDIP): successes and unforseen challenges

被引:0
|
作者
Wyborn, Lesley [1 ]
Evans, Benjamin J. K. [1 ]
机构
[1] Australian Natl Univ, Natl Computat Infrastruct, Canberra, ACT, Australia
关键词
Data-intensive Science; High Performance Data; High Performance Computing; Big Data; Geosciences; Data Platforms;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The Australian Government has begun an initiative to organise publicly funded national data assets and make them accessible for research through the Research Data Services initiative (RDS), which supports over 40 PBytes of multidisciplinary data at eight nodes around Australia. One of these nodes is at the National Computational Infrastructure (NCI) that provides a national comprehensively integrated high performance computing facility. NCI is a partnership between the ANU, the Australian Bureau of Meteorology, Geoscience Australia (GA) and the Australian Commonwealth Science and Industry Research Organisation (CSIRO) and particularly focuses on Earth system sciences. As part of its activity in RDS, NCI has collocated over 10 PBytes of priority research data collections spanning a wide range of disciplines from geosciences, geophysics, environment, climate, weather, and water resources, through to astronomy, bioinformatics, and the social sciences. To facilitate access, maximise reuse and enable integration across the disciplines, data have been built into a platform that NCI has called, the National Environmental Research Data Interoperability Platform (NERDIP). The platform is co-located with the significant HPC resources: a 1.2 PetaFlop supercomputer (Raijin), and a HPC class 3000 core OpenStack cloud system (Tenjin). Combined, they offer unparalleled opportunities for geosciences researchers to undertake innovative Data-intensive Science at scales and resolutions never before attempted, as well as enabling participation in new collaborations in interdisciplinary science. However, compared with other 'Big Data' science disciplines (climate, oceans, weather, astronomy), current geoscience data management practices and data access methods need significant work to be able to scale-up and thus to take advantage of the changes in the global computing landscape. Although the geosciences have many 'Big Data' collections that could be incorporated within NERDIP, they typically comprise heterogeneous files that are distributed over multiple sites and sectors, and it is taking considerable time to aggregate these into large High Performance Data (HPD) sets that are structured to facilitate uptake in HPC environments. Once incorporated into NERDIP, the next challenge is to ensure that researchers are ready to both use modern tools, and to update their working practises so as to process these data effectively. This is an issue in part because the geoscience community has been slow to move to peak-class systems for Data-intensive Science and integrate with the rest of the Earth systems community.
引用
收藏
页码:2005 / 2009
页数:5
相关论文
共 7 条
  • [1] A Research on Collaborative Innovation Platform for University Environmental Big Data
    Zhou, Li
    PROCEEDINGS OF THE 3RD ANNUAL INTERNATIONAL CONFERENCE ON SOCIAL SCIENCE AND CONTEMPORARY HUMANITY DEVELOPMENT (SSCHD 2017), 2017, 90 : 173 - 178
  • [2] Research on the development and intelligent application of power environmental protection platform based on big data
    Shao D.
    Shi L.B.
    He Z.G.
    Guo R.Z.
    Energy Harvest. Syst., 2024, 1
  • [3] Integrating data and analysis technologies within leading environmental research infrastructures: Challenges and approaches
    Huber, Robert
    D'Onofrio, Claudio
    Devaraju, Anusuriya
    Klump, Jens
    Loescher, Henry W.
    Kindermann, Stephan
    Guru, Siddeswara
    Grant, Mark
    Morris, Beryl
    Wyborn, Lesley
    Evans, Ben
    Goldfarb, Doron
    Genazzio, Melissa A.
    Ren, Xiaoli
    Magagna, Barbara
    Thiemann, Hannes
    Stocker, Markus
    ECOLOGICAL INFORMATICS, 2021, 61
  • [4] Distributed Supervised Sentiment Analysis of Tweets: Integrating Machine Learning and Streaming Analytics for Big Data Challenges in Communication and Audience Research
    Arcila Calderon, Carlos
    Ortega Mohedano, Felix
    Alvarez, Mateo
    Vicente Marino, Miguel
    EMPIRIA, 2019, (42): : 113 - 136
  • [5] The Canadian Urban Environmental Health Research Consortium - a protocol for building a national environmental exposure data platform for integrated analyses of urban form and health
    Brook, Jeffrey R.
    Setton, Eleanor M.
    Seed, Evan
    Shooshtari, Mahdi
    Doiron, Dany
    BMC PUBLIC HEALTH, 2018, 18
  • [6] The Canadian Urban Environmental Health Research Consortium – a protocol for building a national environmental exposure data platform for integrated analyses of urban form and health
    Jeffrey R. Brook
    Eleanor M. Setton
    Evan Seed
    Mahdi Shooshtari
    Dany Doiron
    BMC Public Health, 18
  • [7] Research on the evaluation system for the implementation effect of the Chinese water efficiency mandatory national standard for water closets (GB 25502-2017) based on big data platform and FAHP method
    Zhang, Yu-Bo
    Lin, Ling
    Hu, Hong-Ying
    Bai, Xue
    2021 2ND INTERNATIONAL CONFERENCE ON BIG DATA & ARTIFICIAL INTELLIGENCE & SOFTWARE ENGINEERING (ICBASE 2021), 2021, : 714 - 721