Scientific Analysis by Queries in Extended SPARQL over a Scalable e-Science Data Store

被引:3
|
作者
Andrejev, Andrej [1 ]
Toor, Salman [1 ]
Hellander, Andreas [2 ]
Holmgren, Sverker [1 ]
Risch, Tore [1 ]
机构
[1] Uppsala Univ, Dept Informat Technol, Box 337, SE-75105 Uppsala, Sweden
[2] Univ Calif Santa Barbara, Dept Comp Sci, Santa Barbara, CA 93106 USA
关键词
D O I
10.1109/eScience.2013.19
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Data-intensive applications in e-Science require scalable solutions for storage as well as interactive tools for analysis of scientific data. It is important to be able to query the data in a storage-independent way, and to be able to obtain the results of the data-analysis incrementally (in contrast to traditional batch solutions). We use the RDF data model extended with multidimensional numeric arrays to represent the results, parameters, and other metadata describing scientific experiments, and SciSPARQL, an extension of the SPARQL language, to combine massive numeric array data and metadata in queries. To address the scalability problem we present an architecture that enables the same SciSPARQL queries to be executed on the RDF dataset whether it is stored in a relational DBMS or mapped over a specialized geographically distributed e-Science data store. In order to minimize access and communication costs, we represent the arrays with proxy objects, and retrieve their content lazily. We formulate typical analysis tasks from a computational biology application in terms of SciSPARQL queries, and compare the query processing performance with manually written scripts in MATLAB.
引用
收藏
页码:98 / 106
页数:9
相关论文
共 50 条
  • [1] Scalable community-driven data sharing in e-science grids
    Scholl, Tobias
    Bauer, Bernhard
    Gufler, Benjamin
    Kuntschke, Richard
    Reiser, Angelika
    Kemper, Alfons
    [J]. FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF GRID COMPUTING AND ESCIENCE, 2009, 25 (03): : 290 - 300
  • [2] A scalable framework and prototype for CAS e-Science
    Computer Network Information Center, Chinese Academy of Sciences, Beijing, China
    [J]. Data Sci. J., 2007, SUPPL. (S385-S392):
  • [3] Executing SPARQL Queries over the Web of Linked Data
    Hartig, Olaf
    Bizer, Christian
    Freytag, Johann-Christoph
    [J]. SEMANTIC WEB - ISWC 2009, PROCEEDINGS, 2009, 5823 : 293 - +
  • [4] Open data and scientific research: the organizational and regulatory framework of e-science
    Cassella, Maria
    [J]. AIB STUDI, 2013, 53 (03): : 223 - 238
  • [5] E-Science and the data deluge
    Casacuberta, David
    Vallverdu, Jordi
    [J]. PHILOSOPHICAL PSYCHOLOGY, 2014, 27 (01) : 126 - 140
  • [6] Collaborative e-Science Experiments and Scientific Workflows
    Belloum, Adam
    Inda, Marcia A.
    Vasunin, Dmitry
    Korkhov, Vladimir
    Zhao, Zhiming
    Rauwerda, Han
    Breit, Timo M.
    Bubak, Marian
    Hertzberger, Luis O.
    [J]. IEEE INTERNET COMPUTING, 2011, 15 (04) : 39 - 47
  • [7] Scalable long-term preservation of relational data through SPARQL queries
    Stefanova, Silvia
    Risch, Tore
    [J]. SEMANTIC WEB, 2016, 7 (02) : 117 - 137
  • [8] A survey of data provenance in e-science
    Simmhan, YL
    Plale, B
    Gannon, D
    [J]. SIGMOD RECORD, 2005, 34 (03) : 31 - 36
  • [9] CORNER: A Completeness Reasoner for SPARQL Queries Over RDF Data Sources
    Darari, Fariz
    Prasojo, Radityo Eko
    Nutt, Werner
    [J]. SEMANTIC WEB: ESWC 2014 SATELLITE EVENTS, 2014, 8798 : 310 - 314
  • [10] WODII: a solution to process SPARQL queries over distributed data sources
    Ahmed Rabhi
    Rachida Fissoune
    [J]. Cluster Computing, 2020, 23 : 2315 - 2322