Bot recognition in a Web store: An approach based on unsupervised learning

被引:22
|
作者
Rovetta, Stefano [1 ]
Suchacka, Grazyna [2 ]
Masulli, Francesco [1 ]
机构
[1] Univ Genoa, Dept Informat Bioengn Robot & Syst Engn, Genoa, Italy
[2] Univ Opole, Inst Informat, Opole, Poland
关键词
Web bot; Internet robot; Web bot detection; Supervised classification; Unsupervised classification; Machine learning; Web server; ROBOT DETECTION; NEURAL-NETWORK; BEHAVIOR; ATTACKS;
D O I
10.1016/j.jnca.2020.102577
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Web traffic on e-business sites is increasingly dominated by artificial agents (Web bots) which pose a threat to the website security, privacy, and performance. To develop efficient bot detection methods and discover reliable e-customer behavioural patterns, the accurate separation of traffic generated by legitimate users and Web bots is necessary. This paper proposes a machine learning solution to the problem of bot and human session classification, with a specific application to e-commerce. The approach studied in this work explores the use of unsupervised learning (k-means and Graded Possibilistic c-Means), followed by supervised labelling of clusters, a generative learning strategy that decouples modelling the data from labelling them. Its efficiency is evaluated through experiments on real e-commerce data, in realistic conditions, and compared to that of supervised learning classifiers (a multi-layer perceptron neural network and a support vector machine). Results demonstrate that the classification based on unsupervised learning is very efficient, achieving a similar performance level as the fully supervised classification. This is an experimental indication that the bot recognition problem can be successfully dealt with using methods that are less sensitive to mislabelled data or missing labels. A very small fraction of sessions remain misclassified in both cases, so an in-depth analysis of misclassified samples was also performed. This analysis exposed the superiority of the proposed approach which was able to correctly recognize more bots, in fact, and identified more camouflaged agents, that had been erroneously labelled as humans.
引用
收藏
页数:15
相关论文
共 50 条
  • [1] Modulation Recognition: An Unsupervised Learning Approach
    Jajoo, Gaurav
    Singh, Prem
    [J]. IEEE CONFERENCE ON GLOBAL COMMUNICATIONS, GLOBECOM, 2023, : 7441 - 7446
  • [2] Activity Recognition Using Body Mounted Sensors: An Unsupervised Learning based Approach
    Trabelsi, Dorra
    Mohammed, Samer
    Amirat, Yacine
    Oukhellou, Latifa
    [J]. 2012 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2012,
  • [3] Emotion Recognition from Speech: An Unsupervised Learning Approach
    Rovetta, Stefano
    Mnasri, Zied
    Masulli, Francesco
    Cabri, Alberto
    [J]. INTERNATIONAL JOURNAL OF COMPUTATIONAL INTELLIGENCE SYSTEMS, 2021, 14 (01) : 23 - 35
  • [4] Bot detection using unsupervised machine learning
    Wu, Wei
    Alvarez, Jaime
    Liu, Chengcheng
    Sun, Hung-Min
    [J]. MICROSYSTEM TECHNOLOGIES-MICRO-AND NANOSYSTEMS-INFORMATION STORAGE AND PROCESSING SYSTEMS, 2018, 24 (01): : 209 - 217
  • [5] Bot detection using unsupervised machine learning
    Wei Wu
    Jaime Alvarez
    Chengcheng Liu
    Hung-Min Sun
    [J]. Microsystem Technologies, 2018, 24 : 209 - 217
  • [6] Bot or Not? A Case Study on Bot Recognition from Web Session Logs
    Rovetta, Stefano
    Cabri, Alberto
    Masulli, Francesco
    Suchacka, Grazyna
    [J]. QUANTIFYING AND PROCESSING BIOMEDICAL AND BEHAVIORAL SIGNALS, 2019, 103 : 197 - 206
  • [7] Data-driven human and bot recognition from web activity logs based on hybrid learning techniques
    Gajewski, Marek
    Hryniewicz, Olgierd
    Jastrzebska, Agnieszka
    Kozakiewicz, Mariusz
    Opara, Karol
    Owsinski, Jan Wojciech
    Zadrozny, Slawomir
    Zwierzchowski, Tomasz
    [J]. DIGITAL COMMUNICATIONS AND NETWORKS, 2024, 10 (04) : 1178 - 1188
  • [8] Data-driven human and bot recognition from web activity logs based on hybrid learning techniques
    Marek Gajewski
    Olgierd Hryniewicz
    Agnieszka Jastrzbska
    Mariusz Kozakiewicz
    Karol Opara
    Jan Wojciech Owsi nski
    Sawomir Zadro zny
    Tomasz Zwierzchowski
    [J]. Digital Communications and Networks, 2024, 10 (04) : 1178 - 1188
  • [9] Unsupervised facial expression recognition using domain adaptation based dictionary learning approach
    Yan, Keyu
    Zheng, Wenming
    Cui, Zhen
    Zong, Yuan
    Zhang, Tong
    Tang, Chuangao
    [J]. NEUROCOMPUTING, 2018, 319 : 84 - 91
  • [10] A Novel Unsupervised Learning Approach for Assessing Web Services Refactoring
    Rodriguez, Guillermo
    Mateos, Cristian
    Listorti, Luciano
    Hammer, Brian
    Misra, Sanjay
    [J]. INFORMATION AND SOFTWARE TECHNOLOGIES, ICIST 2019, 2019, 1078 : 273 - 284