Real-time user clickstream behavior analysis based on apache storm streaming

被引:2
|
作者
Pal, Gautam [1 ]
Atkinson, Katie [1 ]
Li, Gangmin [2 ]
机构
[1] Univ Liverpool, Dept Comp Sci, Liverpool L69 7ZX, Merseyside, England
[2] Univ Bedfordshire, Sch Comp Sci & Technol, Luton LU1 3JU, Beds, England
关键词
Clickstream analytics; Real-time big data analytics; Real-time data ingestion; Apache storm; Cassandra; Datastax; SPARSITY PROBLEM;
D O I
10.1007/s10660-021-09518-4
中图分类号
F [经济];
学科分类号
02 ;
摘要
This paper presents an approach to analyzing consumers' e-commerce site usage and browsing motifs through pattern mining and surfing behavior. User-generated clickstream is first stored in a client site browser. We build an ingestion pipeline to capture the high-velocity data stream from a client-side browser through Apache Storm, Kafka, and Cassandra. Given the consumer's usage pattern, we uncover the user's browsing intent through n-grams and Collocation methods. An innovative clustering technique is constructed through the Expectation-Maximization algorithm with Gaussian Mixture Model. We discuss a framework for predicting a user's clicks based on the past click sequences through higher order Markov Chains. We developed our model on top of a big data Lambda Architecture which combines high throughput Hadoop batch setup with low latency real-time framework over a large distributed cluster. Based on this approach, we developed an experimental setup for an optimized Storm topology and enhanced Cassandra database latency to achieve real-time responses. The theoretical claims are corroborated with several evaluations in Microsoft Azure HDInsight Apache Storm deployment and in the Datastax distribution of Cassandra. The paper demonstrates that the proposed techniques help user experience optimization, building recently viewed products list, market-driven analyses, and allocation of website resources.
引用
收藏
页码:1829 / 1859
页数:31
相关论文
共 50 条
  • [41] Streaming fragment assignment for real-time analysis of sequencing experiments
    Roberts A.
    Pachter L.
    Nature Methods, 2013, 10 (1) : 71 - 73
  • [42] Near real-time streaming analysis of big fusion data
    Kube, R.
    Churchill, R. M.
    Chang, C. S.
    Choi, J.
    Wang, R.
    Klasky, S.
    Stephey, L.
    Dart, E.
    Choi, M. J.
    PLASMA PHYSICS AND CONTROLLED FUSION, 2022, 64 (03)
  • [43] Performance Analysis of Reconfigurations in Adaptive Real-Time Streaming Applications
    Zhu, Jun
    Sander, Ingo
    Jantsch, Axel
    ACM TRANSACTIONS ON EMBEDDED COMPUTING SYSTEMS, 2012, 11 (01)
  • [44] Real-time system for adaptive video streaming based on SVC
    Wien, Mathias
    Cazoulat, Renaud
    Graffunder, Andreas
    Hutter, Andreas
    Amon, Peter
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2007, 17 (09) : 1227 - 1237
  • [45] A Real-time Anomalies Detection System based on Streaming Technology
    Du, Yutan
    Liu, Jun
    Liu, Fang
    Chen, Luying
    2014 SIXTH INTERNATIONAL CONFERENCE ON INTELLIGENT HUMAN-MACHINE SYSTEMS AND CYBERNETICS (IHMSC), VOL 2, 2014, : 275 - 279
  • [46] Real-time user interest modeling for real-time ranking
    Liu, Xiaozhong
    Turtle, Howard
    JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY, 2013, 64 (08): : 1557 - 1576
  • [47] REAL-TIME INTERPOLATION OF STREAMING DATA
    Debski, Roman
    COMPUTER SCIENCE-AGH, 2020, 21 (04): : 515 - 534
  • [48] NRTS: Content Name-based Real-time Streaming
    Matsuzono, Kazuhisa
    Asaeda, Hitoshi
    2016 13TH IEEE ANNUAL CONSUMER COMMUNICATIONS & NETWORKING CONFERENCE (CCNC), 2016,
  • [49] Design for real-time data acquisition based on streaming technology
    Nakanishi, H
    Kojima, M
    FUSION ENGINEERING AND DESIGN, 2001, 56-57 : 1011 - 1016
  • [50] Feedback-based real-time streaming over WiMax
    Chatterjee, Mainak
    Sengupta, Shamik
    Ganguly, Samrat
    IEEE WIRELESS COMMUNICATIONS, 2007, 14 (01) : 64 - 71