Real-time user clickstream behavior analysis based on apache storm streaming

被引:0
|
作者
Gautam Pal
Katie Atkinson
Gangmin Li
机构
[1] The University of Liverpool,Department of Computer Science
[2] University of Bedfordshire,School of Computer Science & Technology
来源
关键词
Clickstream analytics; Real-time big data analytics; Real-time data ingestion; Apache storm; Cassandra; Datastax;
D O I
暂无
中图分类号
学科分类号
摘要
This paper presents an approach to analyzing consumers’ e-commerce site usage and browsing motifs through pattern mining and surfing behavior. User-generated clickstream is first stored in a client site browser. We build an ingestion pipeline to capture the high-velocity data stream from a client-side browser through Apache Storm, Kafka, and Cassandra. Given the consumer’s usage pattern, we uncover the user’s browsing intent through n-grams and Collocation methods. An innovative clustering technique is constructed through the Expectation-Maximization algorithm with Gaussian Mixture Model. We discuss a framework for predicting a user’s clicks based on the past click sequences through higher order Markov Chains. We developed our model on top of a big data Lambda Architecture which combines high throughput Hadoop batch setup with low latency real-time framework over a large distributed cluster. Based on this approach, we developed an experimental setup for an optimized Storm topology and enhanced Cassandra database latency to achieve real-time responses. The theoretical claims are corroborated with several evaluations in Microsoft Azure HDInsight Apache Storm deployment and in the Datastax distribution of Cassandra. The paper demonstrates that the proposed techniques help user experience optimization, building recently viewed products list, market-driven analyses, and allocation of website resources.
引用
收藏
页码:1829 / 1859
页数:30
相关论文
共 50 条
  • [31] Real-Time Regex Matching With Apache Spark
    Deaton, Sean
    Brownfield, David
    Kosta, Leonard
    Zhu, Zhaozhong
    Matthews, Suzanne J.
    2017 IEEE HIGH PERFORMANCE EXTREME COMPUTING CONFERENCE (HPEC), 2017,
  • [32] A FEATURE EXTRACTION BASED IMPROVED SENTIMENT ANALYSIS ON APACHE SPARK FOR REAL-TIME TWITTER DATA
    Kanungo, Piyush
    Singh, Hari
    SCALABLE COMPUTING-PRACTICE AND EXPERIENCE, 2023, 24 (04): : 847 - 856
  • [33] User Profiling in a SPOC: A method based on User Video Clickstream Analysis
    Belarbi, Naima
    Chafiq, Nadia
    Talbi, Mohammed
    Namir, Abdelwahed
    Benlahmar, Elhabib
    INTERNATIONAL JOURNAL OF EMERGING TECHNOLOGIES IN LEARNING, 2019, 14 (01): : 110 - 124
  • [34] Real-Time User Identification and Behavior Prediction Based on Foot-Pad Recognition
    Heo, Kuk Ho
    Jeong, Seol Young
    Kang, Soon Ju
    SENSORS, 2019, 19 (13)
  • [35] Enhancing Honeypot Fidelity with Real-Time User Behavior Emulation
    Liu, Songsong
    Wang, Shu
    Sun, Kun
    2023 53RD ANNUAL IEEE/IFIP INTERNATIONAL CONFERENCE ON DEPENDABLE SYSTEMS AND NETWORKS - SUPPLEMENTAL VOLUME, DSN-S, 2023, : 146 - 150
  • [36] Insider attack and real-time data mining of user behavior
    Anderson, G. F.
    Selby, D. A.
    Ramsey, M.
    IBM JOURNAL OF RESEARCH AND DEVELOPMENT, 2007, 51 (3-4) : 465 - 475
  • [37] A user's route choice behavior model based on real-time road information
    An, S
    Han, B
    Wang, J
    PROCEEDINGS OF THE 2004 INTERNATIONAL CONFERENCE ON MANAGEMENT SCIENCE & ENGINEERING, VOLS 1 AND 2, 2004, : 2565 - 2569
  • [38] Design and Performance Analysis of Real-Time Dynamic Streaming Applications
    Do, Xuan Khanh
    Louise, Stephane
    Cohen, Albert
    LANGUAGES AND COMPILERS FOR PARALLEL COMPUTING (LCPC 2018), 2019, 11882 : 21 - 36
  • [39] Performance Analysis of Reconfiguration in Adaptive Real-Time Streaming Applications
    Zhu, Jun
    Sander, Ingo
    Jantsch, Axel
    PROCEEDINGS OF THE 2008 IEEE/ACM/IFIP WORKSHOP ON EMBEDDED SYSTEMS FOR REAL-TIME MULTIMEDIA, 2008, : 53 - 58
  • [40] Streaming fragment assignment for real-time analysis of sequencing experiments
    Roberts, Adam
    Pachter, Lior
    NATURE METHODS, 2013, 10 (01) : 71 - U99