A Genetics Clustering-based Approach for Weblog Data Cleaning

被引:0
|
作者
Ganibardi, Amine [1 ]
Ali, Cherif Arab [1 ]
机构
[1] Univ Vincennes St Denis, Adv Informat Lab, 2 Rue Liberte, F-93526 St Denis, France
关键词
Web Usage Mining; Web Usage Data Preprocessing; Weblog Data Cleaning; Genetics Clustering; Genetics Clustering-based Cleaning; DISCOVERY; KNOWLEDGE;
D O I
10.1109/ES.2018.00019
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This paper addresses the issue of Weblog Data cleaning within the scope of Web Usage Mining. Weblog data are information on end-user clicks and underlying user-agent hits recorded by webservers. Since Web Usage Mining is interested in end-user clicks, user-agent hits are referred to as noise to be cleaned before mining. The most referenced and implemented cleaning methods are the conventional and advanced cleaning. They are content-centric filtering heuristics based on the requested resource attribute of weblog databases. These cleaning methods are limited in terms of relevancy, workability and cost constraints, within the context of dynamic and responsive web. In order to deal with these constraints, this contribution introduces a clustering-based cleaning method focused on the genetic features of the logging structure. The introduced cleaning method mines clicks from hits on the basis of their underlying genetics features and statistical properties. The genetics clustering-based cleaning experimentation demonstrates significant advantages compared to the content-centric methods.
引用
收藏
页码:75 / 81
页数:7
相关论文
共 50 条
  • [1] Web Usage Data Cleaning A Rule-Based Approach for Weblog Data Cleaning
    Ganibardi, Amine
    Ali, Cherif Arab
    [J]. BIG DATA ANALYTICS AND KNOWLEDGE DISCOVERY (DAWAK 2018), 2018, 11031 : 193 - 203
  • [2] Clustering-based approach for medical data classification
    Kodabagi, Mallikarjun M.
    Tikotikar, Ahelam
    [J]. CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2019, 31 (14):
  • [3] Graph clustering-based discretization approach to microarray data
    Kittakorn Sriwanna
    Tossapon Boongoen
    Natthakan Iam-On
    [J]. Knowledge and Information Systems, 2019, 60 : 879 - 906
  • [4] Graph clustering-based discretization approach to microarray data
    Sriwanna, Kittakorn
    Boongoen, Tossapon
    Iam-On, Natthakan
    [J]. KNOWLEDGE AND INFORMATION SYSTEMS, 2019, 60 (02) : 879 - 906
  • [5] A clustering-based hybrid approach for dual data reduction
    Ratnoo, Saroj
    Rathee, Seema
    Ahuja, Jyoti
    [J]. INTERNATIONAL JOURNAL OF INTELLIGENT ENGINEERING INFORMATICS, 2018, 6 (05) : 468 - 490
  • [6] Clustering-based data placement in cloud computing: a predictive approach
    Sellami, Mokhtar
    Mezni, Haithem
    Hacid, Mohand Said
    Gammoudi, Mohamed Moshen
    [J]. CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS, 2021, 24 (04): : 3311 - 3336
  • [7] Clustering-based data placement in cloud computing: a predictive approach
    Mokhtar Sellami
    Haithem Mezni
    Mohand Said Hacid
    Mohamed Moshen Gammoudi
    [J]. Cluster Computing, 2021, 24 : 3311 - 3336
  • [8] A Deep Clustering-based Novel Approach for Binning of Metagenomics Data
    Madival, Sharanbasappa D.
    Mishra, Dwijesh Chandra
    Sharma, Anu
    Kumar, Sanjeev
    Maji, Arpan Kumar
    Budhlakoti, Neeraj
    Sinha, Dipro
    Rai, Anil
    [J]. CURRENT GENOMICS, 2022, 23 (05) : 353 - 368
  • [9] Clustering-Based Hybrid Approach for Multivariate Missing Data Imputation
    Dubey, Aditya
    Rasool, Akhtar
    [J]. INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2020, 11 (11) : 710 - 714
  • [10] Detecting Data Accuracy Issues in Textual Geographical Data by a Clustering-based Approach
    Pellegrino, Maria Angela
    Postiglione, Luca
    Scarano, Vittorio
    [J]. CODS-COMAD 2021: PROCEEDINGS OF THE 3RD ACM INDIA JOINT INTERNATIONAL CONFERENCE ON DATA SCIENCE & MANAGEMENT OF DATA (8TH ACM IKDD CODS & 26TH COMAD), 2021, : 208 - 212