Efficient computation of comprehensive statistical information of large OWL datasets: a scalable approach

被引:2
|
作者
Mohamed, Heba [1 ,2 ]
Fathalla, Said [1 ,2 ]
Lehmann, Jens [1 ,3 ]
Jabeen, Hajira [4 ]
机构
[1] Univ Bonn, Smart Data Analyt SDA, Bonn, Germany
[2] Univ Alexandria, Fac Sci, Alexandria, Egypt
[3] Fraunhofer IAIS, Dresden Lab, NetMedia Dept, Dresden, Germany
[4] GESIS Leibniz Inst Social Sci, Cologne, Germany
关键词
Distributed processing; in-memory approach; SANSA framework; scalable architecture; Semantic Web; statistics computations; ONTOLOGY; ENTERPRISE; SCALE;
D O I
10.1080/17517575.2022.2062683
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Computing dataset statistics is crucial for exploring their structure, however, it becomes challenging for large-scale datasets. This has several key benefits, such as link target identification, vocabulary reuse, quality analysis, big data analytics, and coverage analysis. In this paper, we present the first attempt of developing a distributed approach (OWLStats) for collecting comprehensive statistics over large-scale OWL datasets. OWLStats is a distributed in-memory approach for computing 50 statistical criteria for OWL datasets utilizing Apache Spark. We have successfully integrated OWLStats into the SANSA framework. Experiments results prove that OWLStats is linearly scalable in terms of both node and data scalability.
引用
收藏
页数:21
相关论文
共 50 条
  • [1] A Scalable Approach for Distributed Reasoning over Large-scale OWL Datasets
    Mohamed, Heba
    Fathalla, Said
    Lehmann, Jens
    Jabeen, Hajira
    PROCEEDINGS OF THE 13TH INTERNATIONAL JOINT CONFERENCE ON KNOWLEDGE DISCOVERY, KNOWLEDGE ENGINEERING AND KNOWLEDGE MANAGEMENT (KEOD), VOL 2, 2021, : 51 - 60
  • [2] Scalable Computation of Streamlines on Very Large Datasets
    Pugmire, Dave
    Childs, Hank
    Garth, Christoph
    Ahern, Sean
    Weber, Gunther H.
    PROCEEDINGS OF THE CONFERENCE ON HIGH PERFORMANCE COMPUTING NETWORKING, STORAGE AND ANALYSIS, 2009,
  • [3] A Distributed Approach for Parsing Large-scale OWL Datasets
    Mohamed, Heba
    Fathalla, Said
    Lehmann, Jens
    Jabeen, Hajira
    PROCEEDINGS OF THE 12TH INTERNATIONAL JOINT CONFERENCE ON KNOWLEDGE DISCOVERY, KNOWLEDGE ENGINEERING AND KNOWLEDGE MANAGEMENT (KEOD), VOL 2, 2020, : 227 - 234
  • [4] On Scalable and Efficient Computation of Large Scale Optimal Transport
    Xie, Yujia
    Chen, Minshuo
    Jiang, Haoming
    Zhao, Tuo
    Zha, Hongyuan
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 97, 2019, 97
  • [5] A Comprehensive Evaluation of a Novel Approach to Probabilistic Information Extraction from Large Unstructured Datasets
    Trovati, Marcello
    2015 International Conference on Intelligent Networking and Collaborative Systems IEEE INCoS 2015, 2015, : 459 - 462
  • [6] An Efficient Architecture for Parallel Skyline Computation over Large Distributed Datasets
    Li, He
    Jang, Sumin
    Yoo, Jaesoo
    JOURNAL OF INTERNET TECHNOLOGY, 2014, 15 (04): : 577 - 588
  • [7] DSS: A Scalable and Efficient Stratified Sampling Algorithm for Large-Scale Datasets
    Li, Minne
    Li, Dongsheng
    Shen, Siqi
    Zhang, Zhaoning
    Lu, Xicheng
    NETWORK AND PARALLEL COMPUTING, 2016, 9966 : 133 - 146
  • [8] ScalParC: A new scalable and efficient parallel classification algorithm for mining large datasets
    Joshi, MV
    Karypis, G
    Kumar, V
    FIRST MERGED INTERNATIONAL PARALLEL PROCESSING SYMPOSIUM & SYMPOSIUM ON PARALLEL AND DISTRIBUTED PROCESSING, 1998, : 573 - 579
  • [9] An Efficient Route Computation Approach for Large Graphs
    Song Qing
    Wang Xiaofan
    2011 30TH CHINESE CONTROL CONFERENCE (CCC), 2011, : 5537 - 5541
  • [10] Scalable Global Mutual Information Based Feature Selection Framework for Large Scale Datasets
    Soheili, Majid
    Haeri, Maryam Amir
    2021 IEEE 25TH INTERNATIONAL ENTERPRISE DISTRIBUTED OBJECT COMPUTING CONFERENCE (EDOC 2021), 2021, : 41 - 50