Provenance Network AnalyticsAn approach to data analytics using data provenance

被引:0
|
作者
Trung Dong Huynh
Mark Ebden
Joel Fischer
Stephen Roberts
Luc Moreau
机构
[1] University of Southampton,Electronics and Computer Science
[2] University of Oxford,Information Engineering, Department of Engineering Science
[3] University of Nottingham,Mixed Reality Lab., School of Computer Science
[4] King’s College London,Department of Informatics
来源
关键词
Data provenance; Data analytics; Network metrics; Graph classification;
D O I
暂无
中图分类号
学科分类号
摘要
Provenance network analytics is a novel data analytics approach that helps infer properties of data, such as quality or importance, from their provenance. Instead of analysing application data, which are typically domain-dependent, it analyses the data’s provenance as represented using the World Wide Web Consortium’s domain-agnostic PROV data model. Specifically, the approach proposes a number of network metrics for provenance data and applies established machine learning techniques over such metrics to build predictive models for some key properties of data. Applying this method to the provenance of real-world data from three different applications, we show that it can successfully identify the owners of provenance documents, assess the quality of crowdsourced data, and identify instructions from chat messages in an alternate-reality game with high levels of accuracy. By so doing, we demonstrate the different ways the proposed provenance network metrics can be used in analysing data, providing the foundation for provenance-based data analytics.
引用
收藏
页码:708 / 735
页数:27
相关论文
共 50 条
  • [1] Provenance Network Analytics An approach to data analytics using data provenance
    Trung Dong Huynh
    Ebden, Mark
    Fischer, Joel
    Roberts, Stephen
    Moreau, Luc
    DATA MINING AND KNOWLEDGE DISCOVERY, 2018, 32 (03) : 708 - 735
  • [2] Using Provenance in Data Analytics for Seismology: Challenges and Directions
    da Costa, Umberto Souza
    Espinosa-Oviedo, Javier Alfonso
    Musicante, Martin A.
    Vargas-Solar, Genoveva
    Zechinelli-Martini, Jose-Luis
    NEW TRENDS IN DATABASE AND INFORMATION SYSTEMS, ADBIS 2022, 2022, 1652 : 311 - 322
  • [3] Implementing Data Provenance in Health Data Analytics Software
    Xu, Shen
    Fairweather, Elliot
    Rogers, Toby
    Curcin, Vasa
    PROVENANCE AND ANNOTATION OF DATA AND PROCESSES, IPAW 2018, 2018, 11017 : 173 - 176
  • [4] Visualization of Network Data Provenance
    Chen, Peng
    Plale, Beth
    Cheah, You-Wei
    Ghoshal, Devarshi
    Jensen, Scott
    Luo, Yuan
    2012 19TH INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING (HIPC), 2012,
  • [5] Challenges for Provenance Analytics Over Geospatial Data
    Garijo, Daniel
    Gil, Yolanda
    Harth, Andreas
    PROVENANCE AND ANNOTATION OF DATA AND PROCESSES (IPAW 2014), 2015, 8628 : 261 - 263
  • [6] A Data Provenance Visualization Approach
    Yazici, Ilkay Melek
    Karabulut, Erkan
    Aktas, Mehmet S.
    2018 14TH INTERNATIONAL CONFERENCE ON SEMANTICS, KNOWLEDGE AND GRIDS (SKG), 2018, : 84 - 91
  • [7] A Linked Data Approach for Geospatial Data Provenance
    Yuan, Jie
    Yue, Peng
    Gong, Jianya
    Zhang, Mingda
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2013, 51 (11): : 5105 - 5112
  • [8] An Approach to Standalone Provenance Systems for Big Social Provenance Data
    Tas, Yucel
    Baeth, Mohamed Jehad
    Aktas, Mehmet S.
    PROCEEDINGS OF 2016 12TH INTERNATIONAL CONFERENCE ON SEMANTICS, KNOWLEDGE AND GRIDS (SKG), 2016, : 9 - 16
  • [9] Capturing Provenance for Big Data Analytics done Using SQL Interface
    Chacko, Anu Mary
    Basheer, Ajeeb M.
    Kumar, S. D. Madhu
    2015 IEEE UP SECTION CONFERENCE ON ELECTRICAL COMPUTER AND ELECTRONICS (UPCON), 2015,
  • [10] Secure provenance using an authenticated data structure approach
    Jamil, Fuzel
    Khan, Abid
    Anjum, Adeel
    Ahmed, Mansoor
    Jabeen, Farhana
    Javaid, Nadeem
    COMPUTERS & SECURITY, 2018, 73 : 34 - 56