Provenance Network AnalyticsAn approach to data analytics using data provenance

被引:0
|
作者
Trung Dong Huynh
Mark Ebden
Joel Fischer
Stephen Roberts
Luc Moreau
机构
[1] University of Southampton,Electronics and Computer Science
[2] University of Oxford,Information Engineering, Department of Engineering Science
[3] University of Nottingham,Mixed Reality Lab., School of Computer Science
[4] King’s College London,Department of Informatics
来源
关键词
Data provenance; Data analytics; Network metrics; Graph classification;
D O I
暂无
中图分类号
学科分类号
摘要
Provenance network analytics is a novel data analytics approach that helps infer properties of data, such as quality or importance, from their provenance. Instead of analysing application data, which are typically domain-dependent, it analyses the data’s provenance as represented using the World Wide Web Consortium’s domain-agnostic PROV data model. Specifically, the approach proposes a number of network metrics for provenance data and applies established machine learning techniques over such metrics to build predictive models for some key properties of data. Applying this method to the provenance of real-world data from three different applications, we show that it can successfully identify the owners of provenance documents, assess the quality of crowdsourced data, and identify instructions from chat messages in an alternate-reality game with high levels of accuracy. By so doing, we demonstrate the different ways the proposed provenance network metrics can be used in analysing data, providing the foundation for provenance-based data analytics.
引用
收藏
页码:708 / 735
页数:27
相关论文
共 50 条
  • [21] Pipeline provenance for cloud-based big data analytics
    Wang, Ruoyu
    Sun, Daniel
    Li, Guoqiang
    Wong, Raymond
    Chen, Shiping
    SOFTWARE-PRACTICE & EXPERIENCE, 2020, 50 (05): : 658 - 674
  • [22] Prediction of Social Influence for Provenance of Misinformation in Online Social Network Using Big Data Approach
    Kumaran P.
    Sridhar R.
    Computer Journal, 2021, 64 (03): : 391 - 407
  • [23] The provenance of electronic data
    Moreaus, Luc
    Groth, Paul
    Miles, Simon
    Vazquezt Salceda, Javier
    Ibbotson, John
    Jiangs, Sheng
    Munroe, Steve
    Rana, Omer
    Schreiber, Andreas
    Tan, Victor
    Varga, Laszlo
    COMMUNICATIONS OF THE ACM, 2008, 51 (04) : 52 - 58
  • [24] Provenance for Astrophysical Data
    Galkin, Anastasia
    Riebe, Kristin
    Streicher, Ole
    Bonnarel, Francois
    Louys, Mireille
    Sanguillon, Michele
    Servillat, Mathieu
    Nullmeier, Markus
    PROVENANCE AND ANNOTATION OF DATA AND PROCESSES, IPAW 2018, 2018, 11017 : 252 - 256
  • [25] Data Provenance in Agriculture
    Serra da Cruz, Sergio Manuel
    Ceddia, Marcos Bacis
    Tavora Miranda, Renan Carvalho
    Rizzo, Gabriel
    Klinger, Filipe
    Cerceau, Renato
    Mesquita, Ricardo
    Cerceau, Ricardo
    Marinho, Elton Carneiro
    Schmitz, Eber Assis
    Sigette, Elaine
    Cruz, Pedro Vieira
    PROVENANCE AND ANNOTATION OF DATA AND PROCESSES, IPAW 2018, 2018, 11017 : 257 - 261
  • [26] Characterizing data provenance
    Buneman, P
    ADVANCES IN DATABASES, 2000, 1832 : 171 - 171
  • [27] Data provenance and trust
    Viglas, Stratis D.
    Viglas, S.D. (svglas@inf.ed.ac.uk), 1600, Ubiquity Press Ltd (12):
  • [28] Data Provenance and Security
    McDaniel, Patrick
    IEEE SECURITY & PRIVACY, 2011, 9 (02) : 83 - 85
  • [29] A Graph Testing Framework for Provenance Network Analytics
    Roper, Bernard
    Chapman, Adriane
    Martin, David
    Morley, Jeremy
    PROVENANCE AND ANNOTATION OF DATA AND PROCESSES, IPAW 2018, 2018, 11017 : 245 - 251
  • [30] POLYTICS: Provenance- Based Analytics of Data-Centric Applications
    Bourhis, Pierre
    Deutch, Daniel
    Moskovitch, Yuval
    2017 IEEE 33RD INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE 2017), 2017, : 1373 - 1374