Improving Data Provenance Reconstruction via a Multi-Level Funneling Approach

被引:0
|
作者
Vasudevan, Subha [1 ]
Pfeffer, William [1 ]
Davis, Delmar [1 ]
Asuncion, Hazeline [1 ]
机构
[1] Univ Washington, Sch Sci Technol Engn & Math, Bothell, WA 98011 USA
基金
美国国家科学基金会;
关键词
data provenance; provenance reconstruction; Latent Dirichlet Allocation; Genetic Algorithm; Longest Common Subsequence; Statistical Re-clustering; Silhouette Coefficient;
D O I
暂无
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
The ease with which data can be created, copied, modified, and deleted over the Internet has made it increasingly difficult to determine the source of web data. Data provenance, which provides information about the origin and lineage of a dataset, assists in determining its genuineness and trustworthiness. Several data provenance techniques record provenance when the data is created or modified. However, many existing datasets have no recorded provenance. Provenance Reconstruction techniques attempt to generate an approximate provenance in these datasets. Current reconstruction techniques require timing metadata to reconstruct provenance. In this paper, we improve our multi-funneling technique, which combines existing techniques, including topic modeling, longest common subsequence, and genetic algorithm to achieve higher accuracy in reconstructing provenance without requiring timing metadata. In addition, we introduce novel funnels that are customized to the provided datasets, which further boosts precision and recall rates. We evaluated our approach with various experiments and compare the results of our approach with existing techniques. Finally, we present lessons learned, including the applicability of our approach to other datasets.
引用
收藏
页码:175 / 184
页数:10
相关论文
共 50 条
  • [41] An integrated multi-level modeling approach for industrial-scale data interoperability
    Igamberdiev, Muzaffar
    Grossmann, Georg
    Selway, Matt
    Stumptner, Markus
    SOFTWARE AND SYSTEMS MODELING, 2018, 17 (01): : 269 - 294
  • [42] A varying-coefficient approach to estimating multi-level clustered data models
    Jinhong You
    Alan T. K. Wan
    Shu Liu
    Yong Zhou
    TEST, 2015, 24 : 417 - 440
  • [43] A MULTI-LEVEL APPROACH TO ASTHMA HEALTH DISPARITIES: FROM BIOMARKERS TO GEOCODED DATA
    Thakur, N.
    Ye, M.
    Oh, S.
    Borrell, L. N.
    Burchard, E. G.
    RESPIROLOGY, 2018, 23 : 88 - 89
  • [44] An integrated multi-level modeling approach for industrial-scale data interoperability
    Muzaffar Igamberdiev
    Georg Grossmann
    Matt Selway
    Markus Stumptner
    Software & Systems Modeling, 2018, 17 : 269 - 294
  • [45] doepipeline: a systematic approach to optimizing multi-level and multi-step data processing workflows
    Daniel Svensson
    Rickard Sjögren
    David Sundell
    Andreas Sjödin
    Johan Trygg
    BMC Bioinformatics, 20
  • [46] doepipeline: a systematic approach to optimizing multi-level and multi-step data processing workflows
    Svensson, Daniel
    Sjogren, Rickard
    Sundell, David
    Sjodin, Andreas
    Trygg, Johan
    BMC BIOINFORMATICS, 2019, 20 (01)
  • [47] NeuroPictor: Refining fMRI-to-Image Reconstruction via Multi-individual Pretraining and Multi-level Modulation
    Huo, Jingyang
    Wang, Yikai
    Wang, Yun
    Qian, Xuelin
    Li, Chong
    Fu, Yanwei
    Feng, Jianfeng
    COMPUTER VISION - ECCV 2024, PT LI, 2025, 15109 : 56 - 73
  • [48] Improving fashion captioning via attribute-based alignment and multi-level language model
    Tang, Yuhao
    Zhang, Liyan
    Yuan, Ye
    Chen, Zhixian
    APPLIED INTELLIGENCE, 2023, 53 (24) : 30757 - 30777
  • [49] Point Scene Understanding via Points-to-mesh Reconstruction and Multi-level Utilization of Proposals
    Hao, Mengxiang
    Wu, Hang
    Fu, Ruochong
    Miao, Yubin
    SIXTEENTH INTERNATIONAL CONFERENCE ON MACHINE VISION, ICMV 2023, 2024, 13072
  • [50] Improving fashion captioning via attribute-based alignment and multi-level language model
    Yuhao Tang
    Liyan Zhang
    Ye Yuan
    Zhixian Chen
    Applied Intelligence, 2023, 53 : 30803 - 30821