Efficient Provenance Management via Clustering and Hybrid Storage in Big Data Environments

被引:8
|
作者
Hu, Die [1 ]
Feng, Dan [1 ]
Xie, Yulai [2 ]
Xu, Gongming [2 ]
Gu, Xinrui [1 ]
Long, Darrell [3 ]
机构
[1] Huazhong Univ Sci & Technol, Wuhan Natl Lab Optoelect, Sch Comp, Key Lab Informat Storage Syst,Minist Educ China, Wuhan 430074, Peoples R China
[2] Huazhong Univ Sci & Technol, Wuhan Natl Lab Optoelect, Sch Comp, Wuhan 430074, Peoples R China
[3] Univ Calif Santa Cruz, Jack Baskin Sch Engn, Santa Cruz, CA 95064 USA
基金
美国国家科学基金会;
关键词
Big data; provenance management; clustering; hybrid storage; compress;
D O I
10.1109/TBDATA.2019.2907116
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Provenance is a type of metadata that records the creation and transformation of data objects. It has been applied to a wide variety of areas such as security, search, and experimental documentation. However, provenance usually has a vast amount of data with its rapid growth rate which hinders the effective extraction and application of provenance. This paper proposes an efficient provenance management system via clustering and hybrid storage. Specifically, we propose a Provenance-Based Label Propagation Algorithm which is able to regularize and cluster a large number of irregular provenance. Then, we use separate physical storage mediums, such as SSD and HDD, to store hot and cold data separately, and implement a hot/cold scheduling scheme which can update and schedule data between them automatically. Besides, we implement a feedback mechanism which can locate and compress the rarely used cold data according to the query request. The experimental test shows that the system can significantly improve provenance query performance with a small run-time overhead.
引用
收藏
页码:792 / 803
页数:12
相关论文
共 50 条
  • [1] Pagoda: A Hybrid Approach to Enable Efficient Real-Time Provenance Based Intrusion Detection in Big Data Environments
    Xie, Yulai
    Feng, Dan
    Hu, Yuchong
    Li, Yan
    Sample, Staunton
    Long, Darrell Long
    [J]. IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING, 2020, 17 (06) : 1283 - 1296
  • [2] Evaluation of a Hybrid Approach for Efficient Provenance Storage
    Xie, Yulai
    Muniswamy-Reddy, Kiran-Kumar
    Feng, Dan
    Li, Yan
    Long, Darrell D. E.
    [J]. ACM TRANSACTIONS ON STORAGE, 2013, 9 (04)
  • [3] A Hybrid Approach to Clustering in Big Data
    Kumar, Dheeraj
    Bezdek, James C.
    Palaniswami, Marimuthu
    Rajasegarar, Sutharshan
    Leckie, Christopher
    Havens, Timothy Craig
    [J]. IEEE TRANSACTIONS ON CYBERNETICS, 2016, 46 (10) : 2372 - 2385
  • [4] An Efficient Clustering Technique for Big Data Mining
    Banait, Satish S.
    Sane, S. S.
    Talekar, Sopan A.
    [J]. INTERNATIONAL JOURNAL OF NEXT-GENERATION COMPUTING, 2022, 13 (03): : 702 - 717
  • [5] The Overview of Big Data Storage and Management
    Li, Jie
    Xu, Zheng
    Jiang, Yayun
    Zhang, Rui
    [J]. 2014 IEEE 13TH INTERNATIONAL CONFERENCE ON COGNITIVE INFORMATICS & COGNITIVE COMPUTING (ICCI-CC), 2014, : 510 - 513
  • [6] Challenges in Data Acquisition and Management in Big Data Environments
    Staegemann, Daniel
    Volk, Matthias
    Saxena, Akanksha
    Pohl, Matthias
    Nahhas, Abdulrahman
    Hausler, Robert
    Abdallah, Mohammad
    Bosse, Sascha
    Jamous, Naoum
    Turowski, Klaus
    [J]. PROCEEDINGS OF THE 6TH INTERNATIONAL CONFERENCE ON INTERNET OF THINGS, BIG DATA AND SECURITY (IOTBDS), 2021, : 193 - 204
  • [7] Efficient Memory Page Management for NVDIMM-based Big Data Processing Environments
    Kwon, Seog Min
    Bahn, Hyokyung
    [J]. 2017 4TH INTERNATIONAL CONFERENCE ON INFORMATION SCIENCE AND CONTROL ENGINEERING (ICISCE), 2017, : 283 - 287
  • [8] Effective and Efficient Big Data Management in Distributed Environments: Models, Issues, and Research Perspectives
    Cuzzocrea, Alfredo
    [J]. 2019 19TH IEEE/ACM INTERNATIONAL SYMPOSIUM ON CLUSTER, CLOUD AND GRID COMPUTING (CCGRID), 2019, : 556 - 560
  • [9] A Novel Clustering Technique for Efficient Clustering of Big Data in Hadoop Ecosystem
    Sunil Kumar
    Maninder Singh
    [J]. Big Data Mining and Analytics, 2019, 2 (04) : 240 - 247
  • [10] A Novel Clustering Technique for Efficient Clustering of Big Data in Hadoop Ecosystem
    Kumar, Sunil
    Singh, Maninder
    [J]. BIG DATA MINING AND ANALYTICS, 2019, 2 (04): : 240 - 247