Efficient computation of comprehensive statistical information of large OWL datasets: a scalable approach

被引：2

作者：

Mohamed, Heba ^{[1
,2
]}

Fathalla, Said ^{[1
,2
]}

Lehmann, Jens ^{[1
,3
]}

Jabeen, Hajira ^{[4
]}

机构：

[1] Univ Bonn, Smart Data Analyt SDA, Bonn, Germany

[2] Univ Alexandria, Fac Sci, Alexandria, Egypt

[3] Fraunhofer IAIS, Dresden Lab, NetMedia Dept, Dresden, Germany

[4] GESIS Leibniz Inst Social Sci, Cologne, Germany

来源：

ENTERPRISE INFORMATION SYSTEMS | 2023年 / 17卷 / 07期

关键词：

Distributed processing; in-memory approach; SANSA framework; scalable architecture; Semantic Web; statistics computations; ONTOLOGY; ENTERPRISE; SCALE;

D O I：

10.1080/17517575.2022.2062683

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Computing dataset statistics is crucial for exploring their structure, however, it becomes challenging for large-scale datasets. This has several key benefits, such as link target identification, vocabulary reuse, quality analysis, big data analytics, and coverage analysis. In this paper, we present the first attempt of developing a distributed approach (OWLStats) for collecting comprehensive statistics over large-scale OWL datasets. OWLStats is a distributed in-memory approach for computing 50 statistical criteria for OWL datasets utilizing Apache Spark. We have successfully integrated OWLStats into the SANSA framework. Experiments results prove that OWLStats is linearly scalable in terms of both node and data scalability.

引用

页数：21

共 50 条

[1] A Scalable Approach for Distributed Reasoning over Large-scale OWL Datasets
Mohamed, Heba
Fathalla, Said
Lehmann, Jens
Jabeen, Hajira
PROCEEDINGS OF THE 13TH INTERNATIONAL JOINT CONFERENCE ON KNOWLEDGE DISCOVERY, KNOWLEDGE ENGINEERING AND KNOWLEDGE MANAGEMENT (KEOD), VOL 2, 2021, : 51 - 60
[2] Scalable Computation of Streamlines on Very Large Datasets
Pugmire, Dave
Childs, Hank
Garth, Christoph
Ahern, Sean
Weber, Gunther H.
PROCEEDINGS OF THE CONFERENCE ON HIGH PERFORMANCE COMPUTING NETWORKING, STORAGE AND ANALYSIS, 2009,
[3] A Distributed Approach for Parsing Large-scale OWL Datasets
Mohamed, Heba
Fathalla, Said
Lehmann, Jens
Jabeen, Hajira
PROCEEDINGS OF THE 12TH INTERNATIONAL JOINT CONFERENCE ON KNOWLEDGE DISCOVERY, KNOWLEDGE ENGINEERING AND KNOWLEDGE MANAGEMENT (KEOD), VOL 2, 2020, : 227 - 234
[4] On Scalable and Efficient Computation of Large Scale Optimal Transport
Xie, Yujia
Chen, Minshuo
Jiang, Haoming
Zhao, Tuo
Zha, Hongyuan
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 97, 2019, 97
[5] A Comprehensive Evaluation of a Novel Approach to Probabilistic Information Extraction from Large Unstructured Datasets
Trovati, Marcello
2015 International Conference on Intelligent Networking and Collaborative Systems IEEE INCoS 2015, 2015, : 459 - 462
[6] An Efficient Architecture for Parallel Skyline Computation over Large Distributed Datasets
Li, He
Jang, Sumin
Yoo, Jaesoo
JOURNAL OF INTERNET TECHNOLOGY, 2014, 15 (04): : 577 - 588
[7] DSS: A Scalable and Efficient Stratified Sampling Algorithm for Large-Scale Datasets
Li, Minne
Li, Dongsheng
Shen, Siqi
Zhang, Zhaoning
Lu, Xicheng
NETWORK AND PARALLEL COMPUTING, 2016, 9966 : 133 - 146
[8] ScalParC: A new scalable and efficient parallel classification algorithm for mining large datasets
Joshi, MV
Karypis, G
Kumar, V
FIRST MERGED INTERNATIONAL PARALLEL PROCESSING SYMPOSIUM & SYMPOSIUM ON PARALLEL AND DISTRIBUTED PROCESSING, 1998, : 573 - 579
[9] An Efficient Route Computation Approach for Large Graphs
Song Qing
Wang Xiaofan
2011 30TH CHINESE CONTROL CONFERENCE (CCC), 2011, : 5537 - 5541
[10] Scalable Global Mutual Information Based Feature Selection Framework for Large Scale Datasets
Soheili, Majid
Haeri, Maryam Amir
2021 IEEE 25TH INTERNATIONAL ENTERPRISE DISTRIBUTED OBJECT COMPUTING CONFERENCE (EDOC 2021), 2021, : 41 - 50

← 1 2 3 4 5 →