Web Proxy Log Classification for Burst Behavior

被引:0
|
作者
Kiatkumjounwong, Nattapol [1 ]
Ngamsuriyaroj, Sudsanguan [1 ]
Plangprasopchok, Anon [2 ]
机构
[1] Mahidol Univ, Fac Informat & Commun Technol, Nakhon Pathom, Thailand
[2] Natl Elect & Comp Technol Ctr, Thailand Sci Pk, Pathum Thani, Thailand
关键词
Web proxy logs; Log classification; Outlier detection; File categories; File types;
D O I
暂无
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Many organizations and most Internet service providers need to keep the history of web accesses in the form of proxy logs. Such logs would be later used for web usage as well as for investigating abnormal activities including an abuse, a misuse or fraud. This paper classifies web proxy logs into normal, non burst and burst. To filter out normal logs, we use Apriori algorithm in Weka mining tool to detect the outlier based on the duration and the bandwidth of logs for file categories. Burst logs are separated out from outlier logs using the threshold rates computed for file types. The experimental results show the majority of about 80% for normal logs, and burst logs count for about 2% which should be further investigated for unusual behavior. Since the number of logs kept on storage would be very huge, it would take a long time to process them timely. Thus, we measure the performance of parallel log processing on a Hadoop system with varying data size and the number of nodes. We found that the speedup of log processing is well corresponded to the increasing workload, and it would be convincing to process logs in real time.
引用
收藏
页码:472 / 477
页数:6
相关论文
共 50 条
  • [31] User Behavior Analysis in Web Log through Comparative Study of Eclat and Apriori
    Kotiyal, Bina
    Kumar, Ankit
    Pant, Bhaskar
    Goudar, R. H.
    Chauhan, Shivali
    Junee, Sonam
    [J]. 7TH INTERNATIONAL CONFERENCE ON INTELLIGENT SYSTEMS AND CONTROL (ISCO 2013), 2013, : 421 - 426
  • [32] The Gecko NFS Web proxy
    Baker, S
    Hartman, JH
    [J]. PROCEEDINGS OF THE EIGHTH INTERNATIONAL WORLD WIDE WEB CONFERENCE, 1999, : 647 - 658
  • [33] User Classification Algorithm Based on Web Browsing Behavior
    Wan, Hongxin
    Peng, Yun
    [J]. INTERNATIONAL CONFERENCE ON COMPUTATIONAL AND INFORMATION SCIENCES (ICCIS 2014), 2014, : 57 - 62
  • [34] Web user log mining for web retrieval
    Yu, YJ
    Chen, C
    [J]. 2002 IEEE REGION 10 CONFERENCE ON COMPUTERS, COMMUNICATIONS, CONTROL AND POWER ENGINEERING, VOLS I-III, PROCEEDINGS, 2002, : 97 - 100
  • [35] Resisting Web Proxy-Based HTTP Attacks by Temporal and Spatial Locality Behavior
    Xie, Yi
    Tang, S.
    Xiang, Y.
    Hu, J.
    [J]. IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2013, 24 (07) : 1401 - 1410
  • [36] Cut-and-pick transactions for proxy log mining
    Lou, WW
    Liu, GM
    Lu, HJ
    Yang, Q
    [J]. ADVANCES IN DATABASE TECHNOLOGY - EDBT 2002, 2002, 2287 : 88 - 105
  • [37] WEB LOG MINING - A STUDY
    Krishnagandhi, Geetha
    Dhas, Suresh Gnana
    [J]. IIOAB JOURNAL, 2016, 7 (09) : 6 - 15
  • [38] MUNCHAUSEN BY PROXY (MBP), A CLASSIFICATION
    PATTERSON, Y
    ROTHBAUM, R
    WOOLLCOTT, P
    ACETO, T
    [J]. PEDIATRIC RESEARCH, 1989, 25 (04) : A17 - A17
  • [39] Preprocessing and mining web log data for web personalization
    Baglioni, M
    Ferrara, U
    Romei, A
    Ruggieri, S
    Turini, F
    [J]. AI(ASTERISK)IA 2003: ADVANCES IN ARTIFICIAL INTELLIGENCE, PROCEEDINGS, 2003, 2829 : 237 - 249
  • [40] Analysing the web log to determine the efficiency of web systems
    Robal, Tanno
    Kalja, Ahto
    Pold, Janari
    [J]. DATABASES AND INFORMATION SYSTEMS: COMMUNICATIONS, MATERIALS OF DOCTORAL CONSORTIUM, 2006, : 264 - 275