Optimal algorithms for finding user access sessions from very large web logs

被引:22
|
作者
Chen, ZX [1 ]
Fu, AWC
Tong, FCH
机构
[1] Univ Texas Pan Amer, Dept Comp Sci, Edinburg, TX USA
[2] Chinese Univ Hong Kong, Dept Comp Sci, Hong Kong, Peoples R China
[3] Univ Hong Kong, Dept Comp Sci & Informat Syst, Hong Kong, Peoples R China
关键词
web log mining; data preparation; user access sessions; data structures; time complexity;
D O I
10.1023/A:1024606901978
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Although efficient identification of user access sessions from very large web logs is an unavoidable data preparation task for the success of higher level web log mining, little attention has been paid to algorithmic study of this problem. In this paper we consider two types of user access sessions, interval sessions and gap sessions. We design two efficient algorithms for finding respectively those two types of sessions with the help of some proposed structures. We present theoretical analysis of the algorithms and prove that both algorithms have optimal time complexity and certain error-tolerant properties as well. We conduct empirical performance analysis of the algorithms with web logs ranging from 100 megabytes to 500 megabytes. The empirical analysis shows that the algorithms just take several seconds more than the baseline time, i.e., the time needed for reading the web log once sequentially from disk to RAM, testing whether each user access record is valid or not, and writing each valid user access record back to disk. The empirical analysis also shows that our algorithms are substantially faster than the sorting based session finding algorithms. Finally, optimal algorithms for finding user access sessions from distributed web logs are also presented.
引用
下载
收藏
页码:259 / 279
页数:21
相关论文
共 50 条
  • [21] Web Access to Large Audiovisual Assets Based on User Preferences
    K. Karpouzis
    G. Moschovitis
    K. Ntalianis
    S. Ioannou
    S. Kollias
    Multimedia Tools and Applications, 2004, 22 : 215 - 234
  • [22] Web access to large audiovisual assets based on user preferences
    Karpouzis, K
    Moschovitis, G
    Ntalianis, K
    Ioannou, S
    Kollias, S
    MULTIMEDIA TOOLS AND APPLICATIONS, 2004, 22 (03) : 215 - 234
  • [23] Mining Web Usage Profiles from Proxy Logs: User Identification
    Xu, Jing
    Xu, Fei
    Ma, Fanshu
    Zhou, Lei
    Jiang, Shuanglin
    Rao, Zhibo
    2021 IEEE CONFERENCE ON DEPENDABLE AND SECURE COMPUTING (DSC), 2021,
  • [24] Identifying In-App User Actions from Mobile Web Logs
    Priyogi, Bilih
    Sanderson, Mark
    Salim, Flora
    Chan, Jeffrey
    Tomko, Martin
    Ren, Yongli
    ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PAKDD 2018, PT II, 2018, 10938 : 300 - 311
  • [25] Detecting Web Crawlers from Web Server Access Logs with Data Mining Classifiers
    Stevanovic, Dusan
    An, Aijun
    Vlajic, Natalija
    FOUNDATIONS OF INTELLIGENT SYSTEMS, 2011, 6804 : 483 - 489
  • [26] A top-down algorithm for mining web access patterns from web logs
    Guo, JK
    Ruan, BJ
    Cheng, ZP
    Su, FZ
    Wang, YQ
    Deng, XB
    Shang, N
    Zhu, YY
    ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PROCEEDINGS, 2005, 3518 : 838 - 843
  • [27] ELSV: An Effective Anomaly Detection System from Web Access Logs
    Wan, Wei
    Shi, Xin
    Wei, Jinxia
    Zhao, Jing
    Long, Chun
    2021 IEEE INTERNATIONAL PERFORMANCE, COMPUTING, AND COMMUNICATIONS CONFERENCE (IPCCC), 2021,
  • [28] A New Method for Detecting Users Behavior from Web Access Logs
    Sahu, Deepti
    Soni, Rishi
    2015 INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND COMMUNICATION NETWORKS (CICN), 2015, : 1003 - 1007
  • [29] Automatic extraction of user’s search intention from web search logs
    Kinam Park
    Hyesung Jee
    Taemin Lee
    Soonyoung Jung
    Heuiseok Lim
    Multimedia Tools and Applications, 2012, 61 : 145 - 162
  • [30] Automatic extraction of user's search intention from web search logs
    Park, Kinam
    Jee, Hyesung
    Lee, Taemin
    Jung, Soonyoung
    Lim, Heuiseok
    MULTIMEDIA TOOLS AND APPLICATIONS, 2012, 61 (01) : 145 - 162