Real-time user clickstream behavior analysis based on apache storm streaming

被引:0
|
作者
Gautam Pal
Katie Atkinson
Gangmin Li
机构
[1] The University of Liverpool,Department of Computer Science
[2] University of Bedfordshire,School of Computer Science & Technology
来源
关键词
Clickstream analytics; Real-time big data analytics; Real-time data ingestion; Apache storm; Cassandra; Datastax;
D O I
暂无
中图分类号
学科分类号
摘要
This paper presents an approach to analyzing consumers’ e-commerce site usage and browsing motifs through pattern mining and surfing behavior. User-generated clickstream is first stored in a client site browser. We build an ingestion pipeline to capture the high-velocity data stream from a client-side browser through Apache Storm, Kafka, and Cassandra. Given the consumer’s usage pattern, we uncover the user’s browsing intent through n-grams and Collocation methods. An innovative clustering technique is constructed through the Expectation-Maximization algorithm with Gaussian Mixture Model. We discuss a framework for predicting a user’s clicks based on the past click sequences through higher order Markov Chains. We developed our model on top of a big data Lambda Architecture which combines high throughput Hadoop batch setup with low latency real-time framework over a large distributed cluster. Based on this approach, we developed an experimental setup for an optimized Storm topology and enhanced Cassandra database latency to achieve real-time responses. The theoretical claims are corroborated with several evaluations in Microsoft Azure HDInsight Apache Storm deployment and in the Datastax distribution of Cassandra. The paper demonstrates that the proposed techniques help user experience optimization, building recently viewed products list, market-driven analyses, and allocation of website resources.
引用
收藏
页码:1829 / 1859
页数:30
相关论文
共 50 条
  • [21] Analysis of FEC function for real-time DV streaming
    Matsuzono, Kazuhisa
    Asaeda, Hitoshi
    Sugiura, Kazunori
    Nakamura, Osamu
    Murai, Jun
    SUSTAINABLE INTERNET, PROCEEDINGS, 2007, 4866 : 114 - +
  • [22] Arkitekt: streaming analysis and real-time workflows for microscopy
    Roos, Johannes
    Bancelin, Stephane
    Delaire, Tom
    Wilhelmi, Alexander
    Levet, Florian
    Engelhardt, Maren
    Viasnoff, Virgile
    Galland, Remi
    Naegerl, U. Valentin
    Sibarita, Jean-Baptiste
    NATURE METHODS, 2024, 21 (10) : 1884 - 1894
  • [23] Streaming Data Movement for Real-Time Image Analysis
    Abelardo López-Lagunas
    Sek Chai
    Journal of Signal Processing Systems, 2011, 62 : 29 - 42
  • [24] Streaming Data Movement for Real-Time Image Analysis
    Lopez-Lagunas, Abelardo
    Chai, Sek
    JOURNAL OF SIGNAL PROCESSING SYSTEMS FOR SIGNAL IMAGE AND VIDEO TECHNOLOGY, 2011, 62 (01): : 29 - 42
  • [25] Spray: Streaming Log Parser for Real-Time Analysis
    Zou, Feng
    Chen, Xingshu
    Luo, Yonggang
    Huang, Tiemai
    Liao, Zhihong
    Song, Keer
    SECURITY AND COMMUNICATION NETWORKS, 2022, 2022
  • [26] Real-time Evaluation Mechanism Based on Double Evidence Classification of User Behavior
    Zhang, Jiale
    Zhang, Guiling
    Zhang, Xiufang
    INTERNATIONAL JOURNAL OF SECURITY AND ITS APPLICATIONS, 2016, 10 (12): : 31 - 42
  • [27] Game Service Platform based on the Real-time Streaming
    Kim, Kyoung-ill
    Lee, Kyu-chul
    2012 7TH INTERNATIONAL CONFERENCE ON COMPUTING AND CONVERGENCE TECHNOLOGY (ICCCT2012), 2012, : 253 - 256
  • [28] Big Data Real-time Processing Based on Storm
    Yang, Wenjie
    Liu, Xingang
    Zhang, Lan
    Yang, Laurence T.
    2013 12TH IEEE INTERNATIONAL CONFERENCE ON TRUST, SECURITY AND PRIVACY IN COMPUTING AND COMMUNICATIONS (TRUSTCOM 2013), 2013, : 1784 - 1787
  • [29] Global time-based synchronization of real-time multimedia streaming
    Kim, MH
    Jo, EH
    Kim, DH
    NINTH IEEE INTERNATIONAL WORKSHOP ON OBJECT-ORIENTED REAL-TIME DEPENDABLE SYSTEMS, 2004, : 101 - 108
  • [30] Efficient topic partitioning of Apache Kafka for high-reliability real-time data streaming applications
    Raptis, Theofanis P.
    Cicconetti, Claudio
    Passarella, Andrea
    FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2024, 154 : 173 - 188