Malware Classification Based on Semi-Supervised Learning

被引:0
|
作者
Ding, Yu [1 ,2 ]
Zhang, XiaoYu [1 ]
Li, BinBin [1 ]
Xing, Jian [1 ,2 ,3 ]
Qiang, Qian [1 ,2 ,4 ]
Qi, ZiSen [1 ,2 ]
Guo, MengHan [1 ]
Jia, SiYu [1 ,2 ]
Wang, HaiPing [1 ,2 ]
机构
[1] Chinese Acad Sci, Inst Informat Engn, Beijing, Peoples R China
[2] Univ Chinese Acad Sci, Sch Cyber Secur, Beijing, Peoples R China
[3] Coordinat Ctr China, Natl Comp Network Emergency Response Tech Team, Xinjiang Branch, Urumqi, Peoples R China
[4] Coordinat Ctr China, Natl Comp Network Emergency Response Tech Team, Beijing, Peoples R China
来源
基金
中国国家自然科学基金;
关键词
Malware classification; Semi-supervised learning; Contrastive learning;
D O I
10.1007/978-3-031-17551-0_19
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
With the rapid evolution of malware in the past few years, it caused serious threats and damage to network security. To handle this, researchers began to propose effective classification approaches for various malware variants. However, these widely-used methods based on deep learning are in fully supervised manner, which suffers from two inevitable problems: 1) time-consuming: manually labeling data before training fully-supervised models require huge manual efforts. 2) resourceredundancy: a large amount of unlabeled data is not fully used, resulting in a resource waste. To solve the above problems, in this paper we propose a Malware Classification Method based on Semi-Supervised Learning namely MCM-SSL, which divides the model training into a pre-train stage using unlabeled data and a finetune stage using labeled data. The method proposed in this paper effectively uses a large amount of unlabeled data, and only needs a small amount of labeled data to achieve excellent performance. As a result, our method achieves an accuracy of 90.51% on the open-source Virus-MNIST dataset, which is superior to recent state-of-the-art methods. We also verify the generality and robustness of our method using a variety of common neural network algorithms. For the same algorithm, the accuracy of the pre-trained model is on average 2.4% higher than the model without pre-training.
引用
收藏
页码:287 / 301
页数:15
相关论文
共 50 条
  • [1] Malware detection based on semi-supervised learning with malware visualization
    Gao, Tan
    Zhao, Lan
    Li, Xudong
    Chen, Wen
    [J]. MATHEMATICAL BIOSCIENCES AND ENGINEERING, 2021, 18 (05) : 5995 - 6011
  • [2] Malware classification for the cloud via semi-supervised transfer learning
    Gao, Xianwei
    Hu, Changzhen
    Shan, Chun
    Liu, Baoxu
    Niu, Zequn
    Xie, Hui
    [J]. JOURNAL OF INFORMATION SECURITY AND APPLICATIONS, 2020, 55
  • [3] A Novel Malware Traffic Classification Method using Semi-Supervised Learning
    Ning, Jinhui
    Wang, Yu
    Yang, Jie
    Gacanin, Haris
    Ci, Song
    [J]. 2021 IEEE 94TH VEHICULAR TECHNOLOGY CONFERENCE (VTC2021-FALL), 2021,
  • [4] Semi-supervised Learning for Unknown Malware Detection
    Santos, Igor
    Nieves, Javier
    Bringas, Pablo G.
    [J]. INTERNATIONAL SYMPOSIUM ON DISTRIBUTED COMPUTING AND ARTIFICIAL INTELLIGENCE, 2011, 91 : 415 - 422
  • [5] Semi-Supervised Classification Based on Transformed Learning
    Kang Z.
    Liu L.
    Han M.
    [J]. Jisuanji Yanjiu yu Fazhan/Computer Research and Development, 2023, 60 (01): : 103 - 111
  • [6] TEXT CLASSIFICATION BASED ON SEMI-SUPERVISED LEARNING
    Vo Duy Thanh
    Vo Trung Hung
    Pham Minh Tuan
    Doan Van Ban
    [J]. 2013 INTERNATIONAL CONFERENCE OF SOFT COMPUTING AND PATTERN RECOGNITION (SOCPAR), 2013, : 232 - 236
  • [7] Participatory Learning based Semi-supervised Classification
    Deng, Chao
    Guo, Mao-Zu
    Liu, Yang
    Li, Hai-Feng
    [J]. ICNC 2008: FOURTH INTERNATIONAL CONFERENCE ON NATURAL COMPUTATION, VOL 4, PROCEEDINGS, 2008, : 207 - 216
  • [8] Dynamic Android Malware Category Classification using Semi-Supervised Deep Learning
    Mahdavifar, Samaneh
    Kadir, Andi Fitriah Abdul
    Fatemi, Rasool
    Alhadidi, Dima
    Ghorbani, Ali A.
    [J]. 2020 IEEE INTL CONF ON DEPENDABLE, AUTONOMIC AND SECURE COMPUTING, INTL CONF ON PERVASIVE INTELLIGENCE AND COMPUTING, INTL CONF ON CLOUD AND BIG DATA COMPUTING, INTL CONF ON CYBER SCIENCE AND TECHNOLOGY CONGRESS (DASC/PICOM/CBDCOM/CYBERSCITECH), 2020, : 515 - 522
  • [9] A Weak Coupling of Semi-Supervised Learning with Generative Adversarial Networks for Malware Classification
    Wang, Shuwei
    Wang, Qiuyun
    Jiang, Zhengwei
    Wang, Xuren
    Jing, Rongqi
    [J]. 2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2021, : 3775 - 3782
  • [10] Semi-Supervised Learning Based on Cataract Classification and Grading
    Song, Wenai
    Wang, Ping
    Zhang, Xudong
    Wang, Qing
    [J]. PROCEEDINGS 2016 IEEE 40TH ANNUAL COMPUTER SOFTWARE AND APPLICATIONS CONFERENCE WORKSHOPS (COMPSAC), VOL 2, 2016, : 641 - 646