Big Data Provenance: Challenges and Implications for Benchmarking

被引:0
|
作者
Glavic, Boris [1 ]
机构
[1] IIT, Chicago, IL 60615 USA
来源
关键词
Big Data; Benchmarking; Data Provenance;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Data Provenance is information about the origin and creation process of data. Such information is useful for debugging data and transformations, auditing, evaluating the quality of and trust in data, modelling authenticity, and implementing access control for derived data. Provenance has been studied by the database, workflow, and distributed systems communities, but provenance for Big Data - which we refer to as Big Provenance - is a largely unexplored field. This paper reviews existing approaches for large-scale distributed provenance and discusses potential challenges for Big Data benchmarks that aim to incorporate provenance data/management. Furthermore, we will examine how Big Data benchmarking could benefit from different types of provenance information. We argue that provenance can be used for identifying and analyzing performance bottlenecks, to compute performance metrics, and to test a system's ability to exploit commonalities in data and processing.
引用
收藏
页码:72 / 80
页数:9
相关论文
共 50 条
  • [1] Big Data Provenance: Challenges, State of the Art and Opportunities
    Wang, Jianwu
    Crawl, Daniel
    Purawat, Shweta
    Nguyen, Mai
    Altintas, Ilkay
    [J]. PROCEEDINGS 2015 IEEE INTERNATIONAL CONFERENCE ON BIG DATA, 2015, : 2509 - 2516
  • [2] Provenance Research Issues and Challenges in the Big Data Era
    Cuzzocrea, Alfredo
    [J]. IEEE 39TH ANNUAL COMPUTER SOFTWARE AND APPLICATIONS CONFERENCE WORKSHOPS (COMPSAC 2015), VOL 3, 2015, : 684 - 686
  • [3] The Implications from Benchmarking Three Big Data Systems
    Quan, Jing
    Shi, Yingjie
    Zhao, Ming
    Yang, Wei
    [J]. 2013 IEEE INTERNATIONAL CONFERENCE ON BIG DATA, 2013,
  • [4] Big Data Benchmarking
    Baru, Chaitan
    Bhandarkar, Milind
    Nambiar, Raghunath
    Poess, Meikel
    Rabl, Tilmann
    [J]. MBDS '12: PROCEEDINGS OF THE 2012 WORKSHOP ON MANAGEMENT OF BIG DATA SYSTEMS, 2012, : 39 - 40
  • [5] Big Data, Big Challenges Implications for Chief Nurse Executives
    Clancy, Thomas R.
    Reed, Laura
    [J]. JOURNAL OF NURSING ADMINISTRATION, 2016, 46 (03): : 113 - 115
  • [6] The Implications of Diverse Applications and Scalable Data Sets in Benchmarking Big Data Systems
    Jia, Zhen
    Zhou, Runlin
    Zhu, Chunge
    Wang, Lei
    Gao, Wanling
    Shi, Yingjie
    Zhan, Jianfeng
    Zhang, Lixin
    [J]. SPECIFYING BIG DATA BENCHMARKS, 2014, 8163 : 44 - 59
  • [7] A New Benchmark in Benchmarking Big data and automation create opportunities and challenges
    Plumb, Steve
    [J]. MANUFACTURING ENGINEERING, 2024, 172 (04):
  • [8] Benchmarking Spatial Big Data
    Shekhar, Shashi
    Evans, Michael R.
    Gunturi, Viswanath
    Yang, KwangSoo
    Cugler, Daniel Cintra
    [J]. SPECIFYING BIG DATA BENCHMARKS, 2014, 8163 : 81 - 93
  • [9] Securing Big Data Provenance for Auditors: The Big Data Provenance Black Box as Reliable Evidence
    Appelbaum, Deniz
    [J]. JOURNAL OF EMERGING TECHNOLOGIES IN ACCOUNTING, 2016, 13 (01) : 17 - 36
  • [10] Big data management in healthcare: Adoption challenges and implications
    Chen, Peng-Ting
    Lin, Chia-Li
    Wu, Wan-Ning
    [J]. INTERNATIONAL JOURNAL OF INFORMATION MANAGEMENT, 2020, 53