A Near Real-Time Big Data Provenance Generation Method Based on the Conjoint Analysis of Heterogeneous Logs

被引:0
|
作者
Gao, Yuanzhao [1 ]
Chen, Xingyuan [1 ,2 ]
Li, Binglong [1 ]
Du, Xuehui [1 ]
机构
[1] Zhengzhou Sci & Technol Inst, Zhengzhou 450000, Peoples R China
[2] State Key Lab Cryptol, Beijing 100878, Peoples R China
来源
IEEE ACCESS | 2023年 / 11卷
关键词
Big data provenance; provenance generation; multi-log conjoint analysis; hadoop;
D O I
10.1109/ACCESS.2023.3300844
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Data provenance is an effective approach for data security supervision. In the distributed, multi-user, and multi-layer big data system, only the provenance generation method, which leverages the information logged at both application and operating system level, has the capacity to completely obtain the provenance information required for data usage supervision. However, the current research on the conjoint analysis of multiple logs is inadequate, and it is difficult for them to effectively integrate the provenance information extracted from different logs, especially in the big data scenario. For the near real-time provenance generation based on the analysis of multiple heterogeneous logs, this paper employs a Hadoop-based big data system as the research object, and proposes a parallel log analysis method based on auxiliary data structures and multi-threading. For the efficient conjoint analysis of multiple logs, 5 auxiliary data structures are constructed as the medium for the correlation and fusion of log information, and a multi-threading method is presented to parallelize the lookup of provenance information. In order to cope with the complex log record generation rules, log analysis methods for nondeterministic records, non-instantaneous operations, and instantaneous batch operations are proposed to generate provenance information correctly. In addition, a provenance generation framework is established to implement the proposed log analysis method. The experimental results show that the log collection time overhead caused by processing files above MB level is less than 0.1%. The proposed method can analyze logs in near real time and generate provenance information correctly.
引用
收藏
页码:80806 / 80821
页数:16
相关论文
共 50 条
  • [21] Research on Real-time Processing and Stream Analysis of Unstructured Data Based on Big Data Platforms
    Liang, Huichao
    Wang, Di
    Liu, Yuan
    Mei, Lin
    Zhou, Mengxue
    Zhao, Haibin
    PROCEEDINGS OF 2024 INTERNATIONAL CONFERENCE ON MACHINE INTELLIGENCE AND DIGITAL APPLICATIONS, MIDA2024, 2024, : 96 - 101
  • [22] Real-time Data Analysis Model of Power Grid Equipment Based on Big Data Monitoring
    Shi, Yingbin
    Wang, Jie
    Hou, Bing
    Zhan, Zhongqiang
    2022 9TH INTERNATIONAL FORUM ON ELECTRICAL ENGINEERING AND AUTOMATION, IFEEA, 2022, : 705 - 708
  • [23] A Non-Intrusive and Real-Time Data Provenance Method for DDS Systems
    Wei, Siyi
    Tu, Jinbin
    Wang, Yun
    2023 19TH INTERNATIONAL CONFERENCE ON MOBILITY, SENSING AND NETWORKING, MSN 2023, 2023, : 439 - 446
  • [24] Research on Real-time Analysis and Hybrid Encryption of Big Data
    Yang Hui
    Li Zesong
    2019 2ND INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND BIG DATA (ICAIBD 2019), 2019, : 52 - 55
  • [25] Real-Time Feedback Learning System Based on Programming Logs Analysis
    Huang, Sheng-Bo
    Lai, Chin-Feng
    Jeng, Yu-Lin
    JOURNAL OF INTERNET TECHNOLOGY, 2021, 22 (04): : 779 - 787
  • [26] RUBA: Real-time Unstructured Big Data Analysis Framework
    Kim, Jaein
    Kim, Nacwoo
    Lee, Byungtak
    Park, Joonho
    Seo, Kwangik
    Park, Hunyoung
    2013 INTERNATIONAL CONFERENCE ON ICT CONVERGENCE (ICTC 2013): FUTURE CREATIVE CONVERGENCE TECHNOLOGIES FOR NEW ICT ECOSYSTEMS, 2013, : 520 - 524
  • [27] Real-time Analysis and Visualization for Big Data of Energy Consumption
    Li, Jiaxue
    Song, Wei
    Fong, Simon
    2017 INTERNATIONAL CONFERENCE ON SOFTWARE AND E-BUSINESS (ICSEB 2017), 2015, : 13 - 16
  • [28] Railway Big Data Real-time Processing Based on Storm
    Guo, Shihang
    Zhang, Lichen
    PROCEEDINGS OF THE 2016 2ND WORKSHOP ON ADVANCED RESEARCH AND TECHNOLOGY IN INDUSTRY APPLICATIONS, 2016, 81 : 536 - 539
  • [29] Near Real-Time Big Data Stream Processing Platform Using Cassandra
    Pal, Gautam
    Li, Gangmin
    Atkinson, Katie
    2018 4TH INTERNATIONAL CONFERENCE FOR CONVERGENCE IN TECHNOLOGY (I2CT), 2018,
  • [30] InfoFrame table access method for real-time processing of big data
    Oosawa, Hideki
    Miyata, Tsuyoshi
    NEC Technical Journal, 2012, 7 (02): : 23 - 27