Scientific Analysis by Queries in Extended SPARQL over a Scalable e-Science Data Store

被引：3

作者：

Andrejev, Andrej ^{[1
]}

Toor, Salman ^{[1
]}

Hellander, Andreas ^{[2
]}

Holmgren, Sverker ^{[1
]}

Risch, Tore ^{[1
]}

机构：

[1] Uppsala Univ, Dept Informat Technol, Box 337, SE-75105 Uppsala, Sweden

[2] Univ Calif Santa Barbara, Dept Comp Sci, Santa Barbara, CA 93106 USA

来源：

2013 IEEE 9TH INTERNATIONAL CONFERENCE ON E-SCIENCE (E-SCIENCE) | 2013年

关键词：

D O I：

10.1109/eScience.2013.19

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Data-intensive applications in e-Science require scalable solutions for storage as well as interactive tools for analysis of scientific data. It is important to be able to query the data in a storage-independent way, and to be able to obtain the results of the data-analysis incrementally (in contrast to traditional batch solutions). We use the RDF data model extended with multidimensional numeric arrays to represent the results, parameters, and other metadata describing scientific experiments, and SciSPARQL, an extension of the SPARQL language, to combine massive numeric array data and metadata in queries. To address the scalability problem we present an architecture that enables the same SciSPARQL queries to be executed on the RDF dataset whether it is stored in a relational DBMS or mapped over a specialized geographically distributed e-Science data store. In order to minimize access and communication costs, we represent the arrays with proxy objects, and retrieve their content lazily. We formulate typical analysis tasks from a computational biology application in terms of SciSPARQL queries, and compare the query processing performance with manually written scripts in MATLAB.

引用

页码：98 / 106

页数：9

共 50 条

[1] Scalable community-driven data sharing in e-science grids
Scholl, Tobias
Bauer, Bernhard
Gufler, Benjamin
Kuntschke, Richard
Reiser, Angelika
Kemper, Alfons
[J]. FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF GRID COMPUTING AND ESCIENCE, 2009, 25 (03): : 290 - 300
[2] A scalable framework and prototype for CAS e-Science
Computer Network Information Center, Chinese Academy of Sciences, Beijing, China
[J]. Data Sci. J., 2007, SUPPL. (S385-S392):
[3] Executing SPARQL Queries over the Web of Linked Data
Hartig, Olaf
Bizer, Christian
Freytag, Johann-Christoph
[J]. SEMANTIC WEB - ISWC 2009, PROCEEDINGS, 2009, 5823 : 293 - +
[4] Open data and scientific research: the organizational and regulatory framework of e-science
Cassella, Maria
[J]. AIB STUDI, 2013, 53 (03): : 223 - 238
[5] E-Science and the data deluge
Casacuberta, David
Vallverdu, Jordi
[J]. PHILOSOPHICAL PSYCHOLOGY, 2014, 27 (01) : 126 - 140
[6] Collaborative e-Science Experiments and Scientific Workflows
Belloum, Adam
Inda, Marcia A.
Vasunin, Dmitry
Korkhov, Vladimir
Zhao, Zhiming
Rauwerda, Han
Breit, Timo M.
Bubak, Marian
Hertzberger, Luis O.
[J]. IEEE INTERNET COMPUTING, 2011, 15 (04) : 39 - 47
[7] Scalable long-term preservation of relational data through SPARQL queries
Stefanova, Silvia
Risch, Tore
[J]. SEMANTIC WEB, 2016, 7 (02) : 117 - 137
[8] A survey of data provenance in e-science
Simmhan, YL
Plale, B
Gannon, D
[J]. SIGMOD RECORD, 2005, 34 (03) : 31 - 36
[9] CORNER: A Completeness Reasoner for SPARQL Queries Over RDF Data Sources
Darari, Fariz
Prasojo, Radityo Eko
Nutt, Werner
[J]. SEMANTIC WEB: ESWC 2014 SATELLITE EVENTS, 2014, 8798 : 310 - 314
[10] WODII: a solution to process SPARQL queries over distributed data sources
Ahmed Rabhi
Rachida Fissoune
[J]. Cluster Computing, 2020, 23 : 2315 - 2322

← 1 2 3 4 5 →