A Near Real-Time Big Data Provenance Generation Method Based on the Conjoint Analysis of Heterogeneous Logs

被引:0
|
作者
Gao, Yuanzhao [1 ]
Chen, Xingyuan [1 ,2 ]
Li, Binglong [1 ]
Du, Xuehui [1 ]
机构
[1] Zhengzhou Sci & Technol Inst, Zhengzhou 450000, Peoples R China
[2] State Key Lab Cryptol, Beijing 100878, Peoples R China
来源
IEEE ACCESS | 2023年 / 11卷
关键词
Big data provenance; provenance generation; multi-log conjoint analysis; hadoop;
D O I
10.1109/ACCESS.2023.3300844
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Data provenance is an effective approach for data security supervision. In the distributed, multi-user, and multi-layer big data system, only the provenance generation method, which leverages the information logged at both application and operating system level, has the capacity to completely obtain the provenance information required for data usage supervision. However, the current research on the conjoint analysis of multiple logs is inadequate, and it is difficult for them to effectively integrate the provenance information extracted from different logs, especially in the big data scenario. For the near real-time provenance generation based on the analysis of multiple heterogeneous logs, this paper employs a Hadoop-based big data system as the research object, and proposes a parallel log analysis method based on auxiliary data structures and multi-threading. For the efficient conjoint analysis of multiple logs, 5 auxiliary data structures are constructed as the medium for the correlation and fusion of log information, and a multi-threading method is presented to parallelize the lookup of provenance information. In order to cope with the complex log record generation rules, log analysis methods for nondeterministic records, non-instantaneous operations, and instantaneous batch operations are proposed to generate provenance information correctly. In addition, a provenance generation framework is established to implement the proposed log analysis method. The experimental results show that the log collection time overhead caused by processing files above MB level is less than 0.1%. The proposed method can analyze logs in near real time and generate provenance information correctly.
引用
收藏
页码:80806 / 80821
页数:16
相关论文
共 50 条
  • [1] Near real-time streaming analysis of big fusion data
    Kube, R.
    Churchill, R. M.
    Chang, C. S.
    Choi, J.
    Wang, R.
    Klasky, S.
    Stephey, L.
    Dart, E.
    Choi, M. J.
    PLASMA PHYSICS AND CONTROLLED FUSION, 2022, 64 (03)
  • [2] Near Real-Time Big Data Analysis on Vehicular Networks
    Daniel, Alfred
    Paul, Anand
    Ahmad, Awais
    PROCEEDINGS OF THE IEEE INTERNATIONAL CONFERENCE ON SOFT-COMPUTING AND NETWORKS SECURITY (ICSNS 2015), 2015,
  • [3] Near real-time analysis of big fusion data on HPC systems
    Kube, Ralph
    Churchill, R. Michael
    Choi, Jong
    Wang, Ruonan
    Choi, Minjun
    Klasky, Scott
    Chang, C. S.
    PROCEEDINGS OF URGENTHPC 2020: THE IEEE/ACM INTERNATIONAL WORKSHOPS ON URGENT AND INTERACTIVE HPC, 2020, : 55 - 63
  • [4] Real-Time Data ETL Framework for Big Real-Time Data Analysis
    Li, Xiaofang
    Mao, Yingchi
    2015 IEEE INTERNATIONAL CONFERENCE ON INFORMATION AND AUTOMATION, 2015, : 1289 - 1294
  • [5] Logical big data integration and near real-time data analytics
    Silva, Bruno
    Moreira, Jose
    Costa, Rogerio Luis de C.
    DATA & KNOWLEDGE ENGINEERING, 2023, 146
  • [6] A Big Data Architecture for Near Real-time Traffic Analytics
    Gong, Yikai
    Rimba, Paul
    Sinnott, Richard O.
    COMPANION PROCEEDINGS OF THE 10TH INTERNATIONAL CONFERENCE ON UTILITY AND CLOUD COMPUTING (UCC'17 COMPANION), 2017, : 157 - 162
  • [7] Platform for real-time data analysis and visualization based on Big Data methods
    Ferreira, Gabriel
    Alves, Paulo
    de Almeida, Simone
    PROCEEDINGS OF 2021 16TH IBERIAN CONFERENCE ON INFORMATION SYSTEMS AND TECHNOLOGIES (CISTI'2021), 2021,
  • [8] A Kind of Real-Time Search Method for Big Data
    Zhang, Guigang
    INTERNATIONAL CONFERENCE ON REMOTE SENSING AND WIRELESS COMMUNICATIONS (RSWC 2014), 2014, : 395 - 400
  • [9] Towards Lambda-Based Near Real-Time OLAP over Big Data
    Cuzzocrea, Alfredo
    Moussa, Rim
    2018 IEEE 42ND ANNUAL COMPUTER SOFTWARE AND APPLICATIONS CONFERENCE (COMPSAC), VOL 1, 2018, : 438 - 441
  • [10] Near real-time big-data processing for data driven applications
    Kampars, Janis
    Grabis, Janis
    2017 3RD INTERNATIONAL CONFERENCE ON BIG DATA INNOVATIONS AND APPLICATIONS (INNOVATE-DATA), 2017, : 35 - 42