Real-time user clickstream behavior analysis based on apache storm streaming

被引:2
|
作者
Pal, Gautam [1 ]
Atkinson, Katie [1 ]
Li, Gangmin [2 ]
机构
[1] Univ Liverpool, Dept Comp Sci, Liverpool L69 7ZX, Merseyside, England
[2] Univ Bedfordshire, Sch Comp Sci & Technol, Luton LU1 3JU, Beds, England
关键词
Clickstream analytics; Real-time big data analytics; Real-time data ingestion; Apache storm; Cassandra; Datastax; SPARSITY PROBLEM;
D O I
10.1007/s10660-021-09518-4
中图分类号
F [经济];
学科分类号
02 ;
摘要
This paper presents an approach to analyzing consumers' e-commerce site usage and browsing motifs through pattern mining and surfing behavior. User-generated clickstream is first stored in a client site browser. We build an ingestion pipeline to capture the high-velocity data stream from a client-side browser through Apache Storm, Kafka, and Cassandra. Given the consumer's usage pattern, we uncover the user's browsing intent through n-grams and Collocation methods. An innovative clustering technique is constructed through the Expectation-Maximization algorithm with Gaussian Mixture Model. We discuss a framework for predicting a user's clicks based on the past click sequences through higher order Markov Chains. We developed our model on top of a big data Lambda Architecture which combines high throughput Hadoop batch setup with low latency real-time framework over a large distributed cluster. Based on this approach, we developed an experimental setup for an optimized Storm topology and enhanced Cassandra database latency to achieve real-time responses. The theoretical claims are corroborated with several evaluations in Microsoft Azure HDInsight Apache Storm deployment and in the Datastax distribution of Cassandra. The paper demonstrates that the proposed techniques help user experience optimization, building recently viewed products list, market-driven analyses, and allocation of website resources.
引用
收藏
页码:1829 / 1859
页数:31
相关论文
共 50 条
  • [1] Real-time user clickstream behavior analysis based on apache storm streaming
    Gautam Pal
    Katie Atkinson
    Gangmin Li
    Electronic Commerce Research, 2023, 23 : 1829 - 1859
  • [2] Apache Storm Based on Topology for Real-Time Processing of Streaming Data from Social Networks
    Batyuk, Anatoliy
    Voityshyn, Volodymyr
    PROCEEDINGS OF THE 2016 IEEE FIRST INTERNATIONAL CONFERENCE ON DATA STREAM MINING & PROCESSING (DSMP), 2016, : 345 - 349
  • [3] Real-time incremental recommendation for streaming data based on apache flink
    Tang, Zhuo
    Liu, Zeyu
    Li, Kenli
    Li, Keqin
    INTELLIGENT DATA ANALYSIS, 2019, 23 (06) : 1421 - 1437
  • [4] Machine Learning-Based Real-time Task Scheduling for Apache Storm
    Wu, Cheng-Ying
    Zhao, Qi
    Cheng, Cheng-Yu
    Yang, Yuchen
    Qureshi, Muhammad A.
    Liu, Hang
    Chen, Genshe
    SENSORS AND SYSTEMS FOR SPACE APPLICATIONS XVII, 2024, 13062
  • [5] Real-time User-click Recognition Based on Spark Streaming
    Lin, Xiangyue
    Liu, Fang
    Liu, Jun
    PROCEEDINGS OF 2017 3RD IEEE INTERNATIONAL CONFERENCE ON COMPUTER AND COMMUNICATIONS (ICCC), 2017, : 2532 - 2536
  • [6] Structured Streaming: A Declarative API for Real-Time Applications in Apache Spark
    Armbrust, Michael
    Das, Tathagata
    Torres, Joseph
    Yavuz, Burak
    Zhu, Shixiong
    Xin, Reynold
    Ghodsi, Ali
    Stoica, Ion
    Zaharia, Matei
    SIGMOD'18: PROCEEDINGS OF THE 2018 INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 2018, : 601 - 613
  • [7] Detecting a Large Number of Objects in Real-time Using Apache Storm
    Im, Dong-Hyuck
    Cho, Cheol-Hye
    Jung, IlGu
    2014 INTERNATIONAL CONFERENCE ON INFORMATION AND COMMUNICATION TECHNOLOGY CONVERGENCE (ICTC), 2014, : 836 - 838
  • [8] Real-time Hybrid Intrusion Detection System using Apache Storm
    Mylavarapu, Goutam
    Thomas, Johnson
    Kumar, Ashwin T. K.
    2015 IEEE 17TH INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING AND COMMUNICATIONS, 2015 IEEE 7TH INTERNATIONAL SYMPOSIUM ON CYBERSPACE SAFETY AND SECURITY, AND 2015 IEEE 12TH INTERNATIONAL CONFERENCE ON EMBEDDED SOFTWARE AND SYSTEMS (ICESS), 2015, : 1436 - 1441
  • [9] Dynamically Scaling Apache Storm for the Analysis of Streaming Data
    van der Veen, Jan Sipke
    van der Waaij, Bram
    Lazovik, Elena
    Wijbrandi, Wilco
    Meijer, Robert J.
    2015 IEEE FIRST INTERNATIONAL CONFERENCE ON BIG DATA COMPUTING SERVICE AND APPLICATIONS (BIGDATASERVICE 2015), 2015, : 154 - 161
  • [10] Unsupervised Clickstream Clustering for User Behavior Analysis
    Wang, Gang
    Zhang, Xinyi
    Tang, Shiliang
    Zheng, Haitao
    Zhao, Ben Y.
    34TH ANNUAL CHI CONFERENCE ON HUMAN FACTORS IN COMPUTING SYSTEMS, CHI 2016, 2016, : 225 - 236