Towards efficient data search and subsetting of large-scale atmospheric datasets

被引:10
|
作者
Pallickara, Sangmi Lee [1 ]
Pallickara, Shrideep [1 ]
Zupanski, Milija [2 ]
机构
[1] Colorado State Univ, Dept Comp Sci, Ft Collins, CO 80523 USA
[2] Colorado State Univ, Cooperat Inst Res Atmosphere, Ft Collins, CO 80523 USA
基金
美国国家科学基金会;
关键词
Metadata; Discovery; Cloud computing; Atmospheric sciences; Large-scale datasets; CLIMATE; ACCESS;
D O I
10.1016/j.future.2011.05.010
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Discovering the correct dataset in an efficient fashion is critical for effective simulations in the atmospheric sciences. Unlike text-based web documents, many of the large scientific datasets often contain binary encoded data that is hard to discover using popular search engines. In the atmospheric sciences, there has been a significant growth in public data hosting services. However, the ability to index and search has been limited by the metadata provided by the data host. We have developed an infrastructure - Atmospheric Data Discovery System (ADDS) - that provides an efficient data discovery environment for observational datasets in the atmospheric sciences. To support complex querying capabilities, we automatically extract and index fine-grained metadata. Datasets are indexed based on periodic crawling of popular sites and also of files requested by the users. Users are allowed to access subsets of a large dataset through our data customization feature. Our focus is the overall architecture, data subsetting scheme, and a performance evaluation of our system. (C) 2011 Elsevier B.V. All rights reserved.
引用
收藏
页码:112 / 118
页数:7
相关论文
共 50 条
  • [1] Towards algorithmic analytics for large-scale datasets
    Bzdok, Danilo
    Nichols, Thomas E.
    Smith, Stephen M.
    [J]. NATURE MACHINE INTELLIGENCE, 2019, 1 (07) : 296 - 306
  • [2] Towards algorithmic analytics for large-scale datasets
    Danilo Bzdok
    Thomas E. Nichols
    Stephen M. Smith
    [J]. Nature Machine Intelligence, 2019, 1 : 296 - 306
  • [3] Supporting scalable and distributed data subsetting and aggregation in large-scale seismic data analysis
    Zhang, X.
    Rutt, B.
    Catalyuerek, U.
    Kurc, T.
    Stoffa, P.
    Sen, M.
    Saltz, J.
    [J]. INTERNATIONAL JOURNAL OF HIGH PERFORMANCE COMPUTING APPLICATIONS, 2006, 20 (03): : 423 - 438
  • [4] Understanding Data Similarity in Large-Scale Scientific Datasets
    Linton, Payton
    Melodia, William
    Lazar, Alina
    Agarwal, Deborah
    Bianchi, Ludovico
    Ghoshal, Devarshi
    Pastorello, Gilbert
    Ramakrishnan, Lavanya
    Wu, Kesheng
    [J]. 2019 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2019, : 4525 - 4531
  • [5] Towards Matching User Mobility Traces in Large-Scale Datasets
    Kondor, Daniel
    Hashemian, Behrooz
    de Montjoye, Yves-Alexandre
    Ratti, Carlo
    [J]. IEEE TRANSACTIONS ON BIG DATA, 2020, 6 (04) : 714 - 726
  • [6] Efficient Processing of Recursive Joins on Large-Scale Datasets in Spark
    Thuong-Cang Phan
    Anh-Cang Phan
    Thi-To-Quyen Tran
    Ngoan-Thanh Trieu
    [J]. ADVANCED COMPUTATIONAL METHODS FOR KNOWLEDGE ENGINEERING (ICCSAMA 2019), 2020, 1121 : 391 - 402
  • [7] MMSVC: An Efficient Unsupervised Learning Approach for Large-Scale Datasets
    Gu, Hong
    Zhao, Guangzhou
    Zhang, Jianliang
    [J]. LIFE SYSTEM MODELING AND INTELLIGENT COMPUTING, 2010, 6330 : 1 - 9
  • [8] MMSVC: An efficient unsupervised learning approach for large-scale datasets
    Gu, Hong
    Zhao, Guangzhou
    Zhang, Jianliang
    [J]. NEUROCOMPUTING, 2012, 98 : 114 - 122
  • [9] Pyramid: A General Framework for Distributed Similarity Search on Large-scale Datasets
    Deng, Shiyuan
    Yan, Xiao
    Ng, Kelvin K. W.
    Jiang, Chenyu
    Cheng, James
    [J]. 2019 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2019, : 1066 - 1071
  • [10] Mango: Exploratory Data Analysis for Large-Scale Sequencing Datasets
    Morrow, Alyssa Kramer
    He, George Zhixuan
    Nothaft, Frank Austin
    Tu, Eric Tongching
    Paschall, Justin
    Yosef, Nir
    Joseph, Anthony Douglas
    [J]. CELL SYSTEMS, 2019, 9 (06) : 609 - +