Malware Classification Based on Semi-Supervised Learning

被引：0

作者：

Ding, Yu ^{[1
,2
]}

Zhang, XiaoYu ^{[1
]}

Li, BinBin ^{[1
]}

Xing, Jian ^{[1
,2
,3
]}

Qiang, Qian ^{[1
,2
,4
]}

Qi, ZiSen ^{[1
,2
]}

Guo, MengHan ^{[1
]}

Jia, SiYu ^{[1
,2
]}

Wang, HaiPing ^{[1
,2
]}

机构：

[1] Chinese Acad Sci, Inst Informat Engn, Beijing, Peoples R China

[2] Univ Chinese Acad Sci, Sch Cyber Secur, Beijing, Peoples R China

[3] Coordinat Ctr China, Natl Comp Network Emergency Response Tech Team, Xinjiang Branch, Urumqi, Peoples R China

[4] Coordinat Ctr China, Natl Comp Network Emergency Response Tech Team, Beijing, Peoples R China

来源：

SCIENCE OF CYBER SECURITY, SCISEC 2022 | 2022年 / 13580卷

基金：

中国国家自然科学基金;

关键词：

Malware classification; Semi-supervised learning; Contrastive learning;

D O I：

10.1007/978-3-031-17551-0_19

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

With the rapid evolution of malware in the past few years, it caused serious threats and damage to network security. To handle this, researchers began to propose effective classification approaches for various malware variants. However, these widely-used methods based on deep learning are in fully supervised manner, which suffers from two inevitable problems: 1) time-consuming: manually labeling data before training fully-supervised models require huge manual efforts. 2) resourceredundancy: a large amount of unlabeled data is not fully used, resulting in a resource waste. To solve the above problems, in this paper we propose a Malware Classification Method based on Semi-Supervised Learning namely MCM-SSL, which divides the model training into a pre-train stage using unlabeled data and a finetune stage using labeled data. The method proposed in this paper effectively uses a large amount of unlabeled data, and only needs a small amount of labeled data to achieve excellent performance. As a result, our method achieves an accuracy of 90.51% on the open-source Virus-MNIST dataset, which is superior to recent state-of-the-art methods. We also verify the generality and robustness of our method using a variety of common neural network algorithms. For the same algorithm, the accuracy of the pre-trained model is on average 2.4% higher than the model without pre-training.

引用

页码：287 / 301

页数：15

共 50 条

[1] Malware detection based on semi-supervised learning with malware visualization
Gao, Tan
Zhao, Lan
Li, Xudong
Chen, Wen
[J]. MATHEMATICAL BIOSCIENCES AND ENGINEERING, 2021, 18 (05) : 5995 - 6011
[2] Malware classification for the cloud via semi-supervised transfer learning
Gao, Xianwei
Hu, Changzhen
Shan, Chun
Liu, Baoxu
Niu, Zequn
Xie, Hui
[J]. JOURNAL OF INFORMATION SECURITY AND APPLICATIONS, 2020, 55
[3] A Novel Malware Traffic Classification Method using Semi-Supervised Learning
Ning, Jinhui
Wang, Yu
Yang, Jie
Gacanin, Haris
Ci, Song
[J]. 2021 IEEE 94TH VEHICULAR TECHNOLOGY CONFERENCE (VTC2021-FALL), 2021,
[4] Semi-supervised Learning for Unknown Malware Detection
Santos, Igor
Nieves, Javier
Bringas, Pablo G.
[J]. INTERNATIONAL SYMPOSIUM ON DISTRIBUTED COMPUTING AND ARTIFICIAL INTELLIGENCE, 2011, 91 : 415 - 422
[5] Semi-Supervised Classification Based on Transformed Learning
Kang Z.
Liu L.
Han M.
[J]. Jisuanji Yanjiu yu Fazhan/Computer Research and Development, 2023, 60 (01): : 103 - 111
[6] TEXT CLASSIFICATION BASED ON SEMI-SUPERVISED LEARNING
Vo Duy Thanh
Vo Trung Hung
Pham Minh Tuan
Doan Van Ban
[J]. 2013 INTERNATIONAL CONFERENCE OF SOFT COMPUTING AND PATTERN RECOGNITION (SOCPAR), 2013, : 232 - 236
[7] Participatory Learning based Semi-supervised Classification
Deng, Chao
Guo, Mao-Zu
Liu, Yang
Li, Hai-Feng
[J]. ICNC 2008: FOURTH INTERNATIONAL CONFERENCE ON NATURAL COMPUTATION, VOL 4, PROCEEDINGS, 2008, : 207 - 216
[8] Dynamic Android Malware Category Classification using Semi-Supervised Deep Learning
Mahdavifar, Samaneh
Kadir, Andi Fitriah Abdul
Fatemi, Rasool
Alhadidi, Dima
Ghorbani, Ali A.
[J]. 2020 IEEE INTL CONF ON DEPENDABLE, AUTONOMIC AND SECURE COMPUTING, INTL CONF ON PERVASIVE INTELLIGENCE AND COMPUTING, INTL CONF ON CLOUD AND BIG DATA COMPUTING, INTL CONF ON CYBER SCIENCE AND TECHNOLOGY CONGRESS (DASC/PICOM/CBDCOM/CYBERSCITECH), 2020, : 515 - 522
[9] A Weak Coupling of Semi-Supervised Learning with Generative Adversarial Networks for Malware Classification
Wang, Shuwei
Wang, Qiuyun
Jiang, Zhengwei
Wang, Xuren
Jing, Rongqi
[J]. 2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2021, : 3775 - 3782
[10] Semi-Supervised Learning Based on Cataract Classification and Grading
Song, Wenai
Wang, Ping
Zhang, Xudong
Wang, Qing
[J]. PROCEEDINGS 2016 IEEE 40TH ANNUAL COMPUTER SOFTWARE AND APPLICATIONS CONFERENCE WORKSHOPS (COMPSAC), VOL 2, 2016, : 641 - 646

← 1 2 3 4 5 →