Improving Data Provenance Reconstruction via a Multi-Level Funneling Approach

被引:0
|
作者
Vasudevan, Subha [1 ]
Pfeffer, William [1 ]
Davis, Delmar [1 ]
Asuncion, Hazeline [1 ]
机构
[1] Univ Washington, Sch Sci Technol Engn & Math, Bothell, WA 98011 USA
基金
美国国家科学基金会;
关键词
data provenance; provenance reconstruction; Latent Dirichlet Allocation; Genetic Algorithm; Longest Common Subsequence; Statistical Re-clustering; Silhouette Coefficient;
D O I
暂无
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
The ease with which data can be created, copied, modified, and deleted over the Internet has made it increasingly difficult to determine the source of web data. Data provenance, which provides information about the origin and lineage of a dataset, assists in determining its genuineness and trustworthiness. Several data provenance techniques record provenance when the data is created or modified. However, many existing datasets have no recorded provenance. Provenance Reconstruction techniques attempt to generate an approximate provenance in these datasets. Current reconstruction techniques require timing metadata to reconstruct provenance. In this paper, we improve our multi-funneling technique, which combines existing techniques, including topic modeling, longest common subsequence, and genetic algorithm to achieve higher accuracy in reconstructing provenance without requiring timing metadata. In addition, we introduce novel funnels that are customized to the provided datasets, which further boosts precision and recall rates. We evaluated our approach with various experiments and compare the results of our approach with existing techniques. Finally, we present lessons learned, including the applicability of our approach to other datasets.
引用
收藏
页码:175 / 184
页数:10
相关论文
共 50 条
  • [21] A multi-level approach for document clustering
    Oliveira, S
    Seok, SC
    COMPUTATIONAL SCIENCE - ICCS 2005, PT 1, PROCEEDINGS, 2005, 3514 : 204 - 211
  • [22] Defining Species: A Multi-Level Approach
    Baetu, Tudor M.
    ACTA BIOTHEORETICA, 2012, 60 (03) : 239 - 255
  • [23] A Multi-Level Approach to Interoception and Psychopathology
    Khalsa, Sahib
    Puhl, Maria
    Yeh, Hung-wen
    Kuplicki, Rayus
    Forthman, Katie
    Feinstein, Justin
    Victor, Teresa
    Paulus, Martin
    NEUROPSYCHOPHARMACOLOGY, 2018, 43 : S431 - S431
  • [24] A multi-level approach to sensor management
    Strömberg, D
    SENSOR FUSION: ARCHITECTURES, ALGORITHMS, AND APPLICATIONS IV, 2000, 4051 : 456 - 461
  • [25] Defining Species: A Multi-Level Approach
    Tudor M. Baetu
    Acta Biotheoretica, 2012, 60 : 239 - 255
  • [26] Multi-level data fusion method
    Lan, JH
    Ma, BH
    Zhou, ZY
    ISTM/2001: 4TH INTERNATIONAL SYMPOSIUM ON TEST AND MEASUREMENT, VOLS 1 AND 2, CONFERENCE PROCEEDINGS, 2001, : 235 - 238
  • [27] NOTE ON CLASSIFICATION OF MULTI-LEVEL DATA
    LANCE, GN
    WILLIAMS, WT
    COMPUTER JOURNAL, 1967, 9 (04): : 381 - &
  • [28] An approach of multi-level semantics abstraction
    Xu, HL
    Xu, DSZ
    KNOWLEDGE-BASED INTELLIGENT INFORMATION AND ENGINEERING SYSTEMS, PT 2, PROCEEDINGS, 2005, 3682 : 1190 - 1196
  • [29] An Ontology for Multi-Level Data Fusion
    Steinberg, Alan N.
    2022 25TH INTERNATIONAL CONFERENCE ON INFORMATION FUSION (FUSION 2022), 2022,
  • [30] Kokua Mau: A multi-level approach to improving end-of-life care in Hawaii
    Braun, KL
    Zir, A
    GERONTOLOGIST, 2001, 41 : 166 - 166