NodeMerge: Template Based Efficient Data Reduction For Big-Data Causality Analysis

被引：57

作者：

Tang, Yutao ^{[2
]}

Li, Ding ^{[1
]}

Li, Zhichun ^{[1
]}

Zhang, Mu ^{[3
]}

Jee, Kangkook ^{[1
]}

Xiao, Xusheng ^{[4
]}

Wu, Zhenyu ^{[1
]}

Rhee, Junghwan ^{[1
]}

Xu, Fengyuan ^{[5
]}

Li, Qun ^{[2
]}

机构：

[1] NEC Labs Amer Inc, Princeton, NJ 08540 USA

[2] Coll William & Mary, Williamsburg, VA 23187 USA

[3] Cornell Univ, Ithaca, NY 14853 USA

[4] Case Western Reserve Univ, Cleveland, OH 44106 USA

[5] Nanjing Univ, Natl Key Lab Novel Software Technol, Nanjing, Jiangsu, Peoples R China

来源：

PROCEEDINGS OF THE 2018 ACM SIGSAC CONFERENCE ON COMPUTER AND COMMUNICATIONS SECURITY (CCS'18) | 2018年

关键词：

Security; Data Reduction;

D O I：

10.1145/3243734.3243763

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

Today's enterprises are exposed to sophisticated attacks, such as Advanced Persistent Threats (APT) attacks, which usually consist of stealthy multiple steps. To counter these attacks, enterprises often rely on causality analysis on the system activity data collected from a ubiquitous system monitoring to discover the initial penetration point, and from there identify previously unknown attack steps. However, one major challenge for causality analysis is that the ubiquitous system monitoring generates a colossal amount of data and hosting such a huge amount of data is prohibitively expensive. Thus, there is a strong demand for techniques that reduce the storage of data for causality analysis and yet preserve the quality of the causality analysis. To address this problem, in this paper, we propose NodeMerge, a template based data reduction system for online system event storage. Specifically, our approach can directly work on the stream of system dependency data and achieve data reduction on the read-only file events based on their access patterns. It can either reduce the storage cost or improve the performance of causality analysis under the same budget. Only with a reasonable amount of resource for online data reduction, it nearly completely preserves the accuracy for causality analysis. The reduced form of data can be used directly with little overhead. To evaluate our approach, we conducted a set of comprehensive evaluations, which show that for different categories of workloads, our system can reduce the storage capacity of raw system dependency data by as high as 75.7 times, and the storage capacity of the state-of-the-art approach by as high as 32.6 times. Furthermore, the results also demonstrate that our approach keeps all the causality analysis information and has a reasonably small overhead in memory and hard disk.

引用

页码：1324 / 1337

页数：14

共 50 条

[1] Analysis of Big-Data Based Data Mining Engine
Huang, Xinxin
Gong, Shu
2017 13TH INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND SECURITY (CIS), 2017, : 164 - 168
[2] An Efficient Industrial Big-Data Engine
Basanta-Val, Pablo
IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, 2018, 14 (04) : 1361 - 1369
[3] A Data Reconstruction Method for The Big-Data Analysis
Mito, Masataka
Murata, Kenya
Eguchi, Daisuke
Mori, Yuichiro
Toyonaga, Masahiko
2018 9TH INTERNATIONAL CONFERENCE ON AWARENESS SCIENCE AND TECHNOLOGY (ICAST), 2018, : 319 - 323
[4] Analysis of Computer Science Based on Big-Data Mining
Xuan, Liu
Chang, Liu
2020 INTERNATIONAL CONFERENCE ON BIG DATA & ARTIFICIAL INTELLIGENCE & SOFTWARE ENGINEERING (ICBASE 2020), 2020, : 94 - 97
[5] Interpreting big-data analysis of retrospective observational data
Huizinga, Tom W. J.
Knevel, Rachel
LANCET RHEUMATOLOGY, 2020, 2 (11): : E652 - E653
[6] On the Timed Analysis of Big-Data Applications
Marconi, Francesco
Quattrocchi, Giovanni
Baresi, Luciano
Bersani, Marcello M.
Rossi, Matteo
NASA FORMAL METHODS, NFM 2018, 2018, 10811 : 315 - 332
[7] Entropy based streaming big-data reduction with adjustable compression ratio
Erhan Gokcay
Multimedia Tools and Applications, 2024, 83 : 2647 - 2681
[8] Entropy based streaming big-data reduction with adjustable compression ratio
Gokcay, Erhan
MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 83 (1) : 2647 - 2681
[9] Kaleido: Enabling Efficient Scientific Data Processing on Big-Data Systems
Biookaghazadeh, Saman
Zhou, Shujia
Zhao, Ming
2017 INTERNATIONAL CONFERENCE ON NETWORKING, ARCHITECTURE, AND STORAGE (NAS), 2017, : 121 - 130
[10] Big-Data Visualization
Keim, Daniel
Qu, Huamin
Ma, Kwan-Liu
IEEE COMPUTER GRAPHICS AND APPLICATIONS, 2013, 33 (04) : 20 - 21

← 1 2 3 4 5 →