A Near Real-Time Big Data Provenance Generation Method Based on the Conjoint Analysis of Heterogeneous Logs

被引:0
|
作者
Gao, Yuanzhao [1 ]
Chen, Xingyuan [1 ,2 ]
Li, Binglong [1 ]
Du, Xuehui [1 ]
机构
[1] Zhengzhou Sci & Technol Inst, Zhengzhou 450000, Peoples R China
[2] State Key Lab Cryptol, Beijing 100878, Peoples R China
来源
IEEE ACCESS | 2023年 / 11卷
关键词
Big data provenance; provenance generation; multi-log conjoint analysis; hadoop;
D O I
10.1109/ACCESS.2023.3300844
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Data provenance is an effective approach for data security supervision. In the distributed, multi-user, and multi-layer big data system, only the provenance generation method, which leverages the information logged at both application and operating system level, has the capacity to completely obtain the provenance information required for data usage supervision. However, the current research on the conjoint analysis of multiple logs is inadequate, and it is difficult for them to effectively integrate the provenance information extracted from different logs, especially in the big data scenario. For the near real-time provenance generation based on the analysis of multiple heterogeneous logs, this paper employs a Hadoop-based big data system as the research object, and proposes a parallel log analysis method based on auxiliary data structures and multi-threading. For the efficient conjoint analysis of multiple logs, 5 auxiliary data structures are constructed as the medium for the correlation and fusion of log information, and a multi-threading method is presented to parallelize the lookup of provenance information. In order to cope with the complex log record generation rules, log analysis methods for nondeterministic records, non-instantaneous operations, and instantaneous batch operations are proposed to generate provenance information correctly. In addition, a provenance generation framework is established to implement the proposed log analysis method. The experimental results show that the log collection time overhead caused by processing files above MB level is less than 0.1%. The proposed method can analyze logs in near real time and generate provenance information correctly.
引用
收藏
页码:80806 / 80821
页数:16
相关论文
共 50 条
  • [41] Real-time processing of streaming big data
    Safaei, Ali A.
    REAL-TIME SYSTEMS, 2017, 53 (01) : 1 - 44
  • [42] Real-time processing of streaming big data
    Ali A. Safaei
    Real-Time Systems, 2017, 53 : 1 - 44
  • [43] A Distributed Stream Processing Middleware Framework for Real-Time Analysis of Heterogeneous Data on Big Data Platform: Case of Environmental Monitoring
    Akanbi, Adeyinka
    Masinde, Muthoni
    SENSORS, 2020, 20 (11) : 1 - 25
  • [44] A Scalable Streaming Big Data Architecture for Real-Time Sentiment Analysis
    Ayvaz, Serkan
    Shiha, Mohammed O.
    PROCEEDINGS OF 2018 2ND INTERNATIONAL CONFERENCE ON CLOUD AND BIG DATA COMPUTING (ICCBDC 2018), 2018, : 47 - 51
  • [45] Near real-time big data analytics for NFC-enabled logistics trajectories
    Karim, Lamia
    Boulmakoul, Azedine
    Lbath, Ahmed
    PROCEEDINGS OF THE 3RD IEEE INTERNATIONAL CONFERENCE ON LOGISTICS OPERATIONS MANAGEMENT (GOL'16), 2016,
  • [46] A Review on Real-time Big Data Analysis in Remote Sensing Applications
    Pekturk, Mustafa Kemal
    Unal, Muhammet
    2017 25TH SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE (SIU), 2017,
  • [47] Distributed Real-Time Sentiment Analysis for Big Data Social Streams
    Rahnama, Amir Hossein Akhavan
    2014 INTERNATIONAL CONFERENCE ON CONTROL, DECISION AND INFORMATION TECHNOLOGIES (CODIT), 2014, : 789 - 794
  • [48] Study of CDR Real-time Query Based on Big Data Technologies
    Gao, Zhiheng
    Chen, Kang
    Bi, Lingyan
    PROGRESS IN MECHATRONICS AND INFORMATION TECHNOLOGY, PTS 1 AND 2, 2014, 462-463 : 845 - +
  • [49] A Public Safety Deduction Framework Based on Real-Time Big Data
    Chen, Bin
    Luo, Yuyu
    Qiu, Xiaogang
    THEORY, METHODOLOGY, TOOLS AND APPLICATIONS FOR MODELING AND SIMULATION OF COMPLEX SYSTEMS, PT III, 2016, 645 : 574 - 584
  • [50] An Integrated Software System for Supporting Real-Time Near-Infrared Spectral Big Data Analysis and Management
    Zhao, Liping
    Hu, Shupeng
    Zeng, Xiaojun
    Wu, Yuejin
    Lin, Yanqing
    Liu, Jing
    Fan, Shuang
    Wang, Qi
    Xu, Zhuopin
    Wang, Yu
    2017 IEEE 6TH INTERNATIONAL CONGRESS ON BIG DATA (BIGDATA CONGRESS 2017), 2017, : 97 - 104