Multi-Modal Clustering Discovery Method for Illegal Websites Based on Network Surveying and Mapping Big Data

被引:0
|
作者
Wang, Bo [1 ]
Shi, Fan [1 ]
Zheng, Haiyang [1 ]
机构
[1] Natl Univ Def Technol, Coll Elect Engn, Hefei 230037, Peoples R China
来源
APPLIED SCIENCES-BASEL | 2023年 / 13卷 / 17期
关键词
unsupervised learning; clustering; multimodal; network mapping;
D O I
10.3390/app13179837
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
With the development of internet technology, the number of illicit websites such as gambling and pornography has dramatically increased, posing serious threats to people's physical and mental health, as well as their financial security. Currently, the governance of such illicit websites mainly focuses on limited-scale detection through manual annotation. However, the need for effective solutions to govern illicit websites is urgent, requiring the ability to rapidly acquire large volumes of existing website data from the internet. Web mapping engines can provide massive, near real-time web data, which plays a crucial role in batch detection of illicit websites. Therefore, in this paper, we propose a method that combines web mapping engine big data to perform unsupervised multimodal clustering (MDC) for illicit website discovery. By extracting features based on contrastive learning methods from webpage screenshots and OCR text, we conduct feature similarity clustering to identify illicit websites. Finally, our unsupervised clustering model achieved an overall accuracy of 84.1% on all confidence levels, and a 92.39% accuracy at a confidence level of 0.999 or higher. By applying the MDC model to 3.7 million real web mapping data, we obtained 397,275 illicit websites primarily focused on gambling and pornography, with 14 attributes. This dataset is made publicly.
引用
收藏
页数:20
相关论文
共 50 条
  • [1] A Clustering Algorithm for Multi-Modal Heterogeneous Big Data With Abnormal Data
    Yan, An
    Wang, Wei
    Ren, Yi
    Geng, HongWei
    FRONTIERS IN NEUROROBOTICS, 2021, 15
  • [2] A calculation method of OD matrix in multi-modal transit network based on traffic big data
    Gao, Li-Xiao
    Hu, Ji-hua
    Li, Guo-yuan
    Liang, Jia-xian
    3RD INTERNATIONAL CONFERENCE ON TRANSPORTATION INFORMATION AND SAFETY (ICTIS 2015), 2015, : 295 - 298
  • [3] Multi-Modal Data Fusion for Big Events
    Papacharalapous, A. E.
    Hovelynck, Stefan
    Cats, O.
    Lankhaar, J. W.
    Daamen, W.
    van Oort, N.
    van Lint, J. W. C.
    IEEE INTELLIGENT TRANSPORTATION SYSTEMS MAGAZINE, 2015, 7 (04) : 5 - 10
  • [4] User Multi-Modal Emotional Intelligence Analysis Method Based on Deep Learning in Social Network Big Data Environment
    Zhang, Chunqin
    Xie, Lichun
    Aizezi, Yasen
    Gu, Xiaoqing
    IEEE ACCESS, 2019, 7 : 181758 - 181766
  • [5] Spatial mapping of multi-modal data in neuroscience
    Hawrylycz, Mike
    Sunkin, Susan
    Ng, Lydia
    METHODS, 2015, 73 : 1 - 3
  • [6] Multi-Modal Joint Clustering With Application for Unsupervised Attribute Discovery
    Liu, Liangchen
    Nie, Feiping
    Wiliem, Arnold
    Li, Zhihui
    Zhang, Teng
    Lovell, Brian C.
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2018, 27 (09) : 4345 - 4356
  • [7] A multi-modal health data fusion and analysis method based on body sensor network
    Wang, Lei
    Chen, Yibo
    Zhao, Zhenying
    Zhao, Lingxiao
    Li, Jin
    Li, Cuimin
    INTERNATIONAL JOURNAL OF SERVICES TECHNOLOGY AND MANAGEMENT, 2019, 25 (5-6) : 474 - 491
  • [8] Structure Discovery in Multi-modal Data: a Region-based Approach
    Collet, Alvaro
    Srinivasa, Siddhartha S.
    Hebert, Martial
    2011 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA), 2011,
  • [9] Truth Discovery With Multi-Modal Data in Social Sensing
    Shao, Huajie
    Sun, Dachun
    Yao, Shuochao
    Su, Lu
    Wang, Zhibo
    Liu, Dongxin
    Liu, Shengzhong
    Kaplan, Lance
    Abdelzaher, Tarek
    IEEE TRANSACTIONS ON COMPUTERS, 2021, 70 (09) : 1325 - 1337
  • [10] New System of Multi-modal Information Hiding Based on Big Data Environment
    Huang D.-Z.
    Zhang J.-F.
    Zhang R.
    Li P.-C.
    Guo Y.-B.
    Zhang, Jing-Fei (buptzhjf@163.com), 1600, Chinese Institute of Electronics (45): : 477 - 484