Towards a Utopia of Dataset Sharing: A Case Study on Machine Learning-based Malware Detection Algorithms

被引:0
|
作者
Chuang, Ping-Jui [1 ]
Hsu, Chih-Fan [1 ]
Chu, Yung-Tien [1 ]
Huang, Szu-Chun [1 ]
Huang, Chun-Ying [1 ]
机构
[1] Natl Yang Ming Chiao Tung Univ, Hsinchu, Taiwan
关键词
Dataset Sharing; Machine Learning; Malware Classification; Reproducible Research;
D O I
10.1145/3488932.3497763
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Working with a high-quality (complete and up-to-date) dataset is the key to building a good machine learning model, especially in security research areas. However, it is not easy to collect a good quality dataset for security research communities because of the sensitive property of most security datasets. We believe that having more contributors to share up-to-date samples would increase the quality of datasets. Therefore, this study aims to increase security dataset sharing for research communities by eliminating possible information leakage. We propose a dataset sharing model and the core algorithm, FeatureTransformer, which guarantees no sensitive information leakage from a shared dataset. FeatureTransformer transforms extracted raw features into intermediate features that conceal sensitive information. Meanwhile, models built from transformed features maintain similar performance compared to models built from the original raw features. We show the effectiveness of our model by evaluating FeatureTransformer with typical malware classification problems using (1) traditional machine learning classifiers and (2) neural network-based classifiers. The experiment results show that the models trained with transformed features merely suffer from 2.56% and 1.48% accuracy degradation on the investigated problems. It indicates that models validated by datasets processed by FeatureTransformer work well with the original raw (untransformed) datasets. We believe that our privacy-preserving model can stimulate dataset sharing and advance the development of machine learning approaches in solving security problems.
引用
收藏
页码:479 / 493
页数:15
相关论文
共 50 条
  • [31] A Review on Machine Learning-based Malware Detection Techniques for Internet of Things (IoT) Environments
    S. Sasikala
    Sengathir Janakiraman
    [J]. Wireless Personal Communications, 2023, 132 (3) : 1961 - 1974
  • [32] Comparative Analysis of Machine Learning-Based Algorithms for Detection of Anomalies in IIoT
    Naik, Bhupal D. S.
    Dondeti, Venkatesulu
    Balakrishna, Sivadi
    [J]. INTERNATIONAL JOURNAL OF INFORMATION RETRIEVAL RESEARCH, 2022, 12 (01)
  • [33] Android Malware Detection Based on Machine Learning
    Wang, Qing-Fei
    Fang, Xiang
    [J]. 2018 4TH ANNUAL INTERNATIONAL CONFERENCE ON NETWORK AND INFORMATION SYSTEMS FOR COMPUTERS (ICNISC 2018), 2018, : 434 - 436
  • [34] Towards a Mobile Malware Detection Framework with the Support of Machine Learning
    Geneiatakis, Dimitris
    Baldini, Gianmarco
    Fovino, Igor Nai
    Vakalis, Ioannis
    [J]. SECURITY IN COMPUTER AND INFORMATION SCIENCES, EURO-CYBERSEC 2018, 2018, 821 : 119 - 129
  • [35] A novel deep learning-based approach for malware detection
    Shaukat, Kamran
    Luo, Suhuai
    Varadharajan, Vijay
    [J]. ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2023, 122
  • [36] Towards Deep Learning-Based Approach for Detecting Android Malware
    Booz, Jarrett
    McGiff, Josh
    Hatcher, William
    Yu, Wei
    Nguyen, James
    Lu, Chao
    [J]. INTERNATIONAL JOURNAL OF SOFTWARE INNOVATION, 2019, 7 (04) : 1 - 24
  • [37] ATMPA: Attacking Machine Learning-based Malware Visualization Detection Methods via Adversarial Examples
    Liu, Xinbo
    Zhang, Jiliang
    Lin, Yaping
    Li, He
    [J]. PROCEEDINGS OF THE IEEE/ACM INTERNATIONAL SYMPOSIUM ON QUALITY OF SERVICE (IWQOS 2019), 2019,
  • [38] Preliminary Results of Applying Machine Learning Algorithms to Android Malware Detection
    Leeds, Matthew
    Atkison, Travis
    [J]. 2016 INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE & COMPUTATIONAL INTELLIGENCE (CSCI), 2016, : 1070 - 1073
  • [39] Malware Detection in Android Mobile Platform using Machine Learning Algorithms
    Al Ali, Mariam
    Svetinovic, Davor
    Aung, Zeyar
    Lukman, Suryani
    [J]. 2017 INTERNATIONAL CONFERENCE ON INFOCOM TECHNOLOGIES AND UNMANNED SYSTEMS (TRENDS AND FUTURE DIRECTIONS) (ICTUS), 2017, : 763 - 768
  • [40] SecureDroid: Enhancing Security of Machine Learning-based Detection against Adversarial Android Malware Attacks
    Chen, Lingwei
    Hou, Shifu
    Ye, Yanfang
    [J]. 33RD ANNUAL COMPUTER SECURITY APPLICATIONS CONFERENCE (ACSAC 2017), 2017, : 362 - 372