Towards a Utopia of Dataset Sharing: A Case Study on Machine Learning-based Malware Detection Algorithms

被引：0

作者：

Chuang, Ping-Jui ^{[1
]}

Hsu, Chih-Fan ^{[1
]}

Chu, Yung-Tien ^{[1
]}

Huang, Szu-Chun ^{[1
]}

Huang, Chun-Ying ^{[1
]}

机构：

[1] Natl Yang Ming Chiao Tung Univ, Hsinchu, Taiwan

来源：

ASIA CCS'22: PROCEEDINGS OF THE 2022 ACM ASIA CONFERENCE ON COMPUTER AND COMMUNICATIONS SECURITY | 2022年

关键词：

Dataset Sharing; Machine Learning; Malware Classification; Reproducible Research;

D O I：

10.1145/3488932.3497763

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Working with a high-quality (complete and up-to-date) dataset is the key to building a good machine learning model, especially in security research areas. However, it is not easy to collect a good quality dataset for security research communities because of the sensitive property of most security datasets. We believe that having more contributors to share up-to-date samples would increase the quality of datasets. Therefore, this study aims to increase security dataset sharing for research communities by eliminating possible information leakage. We propose a dataset sharing model and the core algorithm, FeatureTransformer, which guarantees no sensitive information leakage from a shared dataset. FeatureTransformer transforms extracted raw features into intermediate features that conceal sensitive information. Meanwhile, models built from transformed features maintain similar performance compared to models built from the original raw features. We show the effectiveness of our model by evaluating FeatureTransformer with typical malware classification problems using (1) traditional machine learning classifiers and (2) neural network-based classifiers. The experiment results show that the models trained with transformed features merely suffer from 2.56% and 1.48% accuracy degradation on the investigated problems. It indicates that models validated by datasets processed by FeatureTransformer work well with the original raw (untransformed) datasets. We believe that our privacy-preserving model can stimulate dataset sharing and advance the development of machine learning approaches in solving security problems.

引用

页码：479 / 493

页数：15

共 50 条

[1] A new machine learning-based method for android malware detection on imbalanced dataset
Dehkordy, Diyana Tehrany
Rasoolzadegan, Abbas
[J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2021, 80 (16) : 24533 - 24554
[2] A new machine learning-based method for android malware detection on imbalanced dataset
Diyana Tehrany Dehkordy
Abbas Rasoolzadegan
[J]. Multimedia Tools and Applications, 2021, 80 : 24533 - 24554
[3] Towards an Understanding of the Misclassification Rates of Machine Learning-based Malware Detection Systems
Alruhaily, Nada
Bordbar, Behzad
Chothia, Tom
[J]. ICISSP: PROCEEDINGS OF THE 3RD INTERNATIONAL CONFERENCE ON INFORMATION SYSTEMS SECURITY AND PRIVACY, 2017, : 101 - 112
[4] On the Robustness of Machine Learning Based Malware Detection Algorithms
Hu, Weiwei
Tan, Ying
[J]. 2017 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2017, : 1435 - 1441
[5] Machine learning-based intrusion detection algorithms
Tang, Hua
Cao, Zhuolin
[J]. Journal of Computational Information Systems, 2009, 5 (06): : 1825 - 1831
[6] Transferability of Adversarial Examples in Machine Learning-based Malware Detection
Hu, Yang
Wang, Ning
Chen, Yimin
Lou, Wenjing
Hou, Y. Thomas
[J]. 2022 IEEE CONFERENCE ON COMMUNICATIONS AND NETWORK SECURITY (CNS), 2022, : 28 - 36
[7] A survey on machine learning-based malware detection in executable files
Singh, Jagsir
Singh, Jaswinder
[J]. JOURNAL OF SYSTEMS ARCHITECTURE, 2021, 112
[8] ERMDS: A obfuscation dataset for evaluating robustness of learning-based malware detection system
Jia, Lichen
Yang, Yang
Tang, Bowen
Jiang, Zihan
[J]. BenchCouncil Transactions on Benchmarks, Standards and Evaluations, 2023, 3 (01):
[9] Evaluation of Machine Learning Algorithms for Malware Detection
Akhtar, Muhammad Shoaib
Feng, Tao
[J]. SENSORS, 2023, 23 (02)
[10] Malware Detection and Classification with Machine Learning Algorithms
Kumar, R. Vinoth
Islam, Md Mojahidul
Apon, Abir Hossain
Prantha, C. S.
[J]. SMART TRENDS IN COMPUTING AND COMMUNICATIONS, VOL 5, SMARTCOM 2024, 2024, 949 : 143 - 158

← 1 2 3 4 5 →