Web outlier mining: Discovering outliers from web datasets

被引:2
|
作者
Agyemang, Malik [1 ]
Barker, Ken [1 ]
Alhajj, Reda [1 ,2 ]
机构
[1] Univ Calgary, Dept Comp Sci, Calgary, AB T2N 1N4, Canada
[2] Global Univ, Dept Comp Sci, Beirut, Lebanon
关键词
web outliers; content-specific algorithm; taxonomy; web mining; embedded motifs;
D O I
10.3233/IDA-2005-9505
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Exception mining in large datasets is an important task in traditional data mining with numerous applications in credit card fraud detection, weather prediction, intrusion detection, and cellular phone cloning fraud detection; among other applications. Sifting through the dynamic, unstructured, and ever-growing web data for outliers is more challenging than finding outliers in numeric datasets. Interestingly, existing outlier mining algorithms are restricted to finding outliers in numeric datasets leaving web outlier mining as an open research issue. Web outliers are web data that show significantly different characteristics than other web data taken from the same category. Although the presence of web outliers appears obvious, algorithms for mining them are currently unavailable. Secondly, traditional outlier mining algorithms designed solely for numeric datasets cannot be used on web datasets because they typically contain multimedia. This paper establishes the presence of outliers on the web called web outliers and proposes a general framework for mining them. A web outlier taxonomy is reported that supports the development of content-specific algorithms for mining web outliers. Finally, we propose the WCO-Mine algorithm for mining web content outliers. Experimental results demonstrate that WCO-Mine is capable of finding web outliers from web datasets.
引用
收藏
页码:473 / 486
页数:14
相关论文
共 50 条
  • [1] Mining the Web: Discovering knowledge from hypertext data
    Srisa-ard, S
    [J]. ONLINE INFORMATION REVIEW, 2003, 27 (04) : 291 - 291
  • [2] Mining the Web: Discovering knowledge from hypertext data
    Krishnamurthy, S
    [J]. JOURNAL OF MARKETING RESEARCH, 2005, 42 (03) : 380 - 382
  • [3] Referential context mining: Discovering viewpoints from the Web
    Zettsu, K
    Tanaka, K
    [J]. 2005 IEEE/WIC/ACM International Conference on Web Intelligence, Proceedings, 2005, : 321 - 325
  • [4] Mining the Web: Discovering knowledge from hypertext data
    Jansen, BJ
    [J]. INFORMATION PROCESSING & MANAGEMENT, 2006, 42 (01) : 317 - 318
  • [5] Signed approach for mining web content outliers
    Poonkuzhali, G.
    Thiagarajan, K.
    Sarukesi, K.
    Uma, G.V.
    [J]. World Academy of Science, Engineering and Technology, 2009, 32 : 820 - 824
  • [6] Mining the Web: Discovering knowledge from hypertext data.
    Chen, CM
    [J]. JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY, 2004, 55 (03): : 275 - 276
  • [7] Automatically discovering the number of clusters in Web page datasets
    Yao, ZM
    Choi, B
    [J]. DMIN '05: PROCEEDINGS OF THE 2005 INTERNATIONAL CONFERENCE ON DATA MINING, 2005, : 3 - 9
  • [8] Discovering Criminal Networks by Web Structure Mining
    Hosseinkhani, Javad
    Chuprat, Suriayati
    Taherdoost, Hamed
    [J]. 2012 7TH INTERNATIONAL CONFERENCE ON COMPUTING AND CONVERGENCE TECHNOLOGY (ICCCT2012), 2012, : 1074 - 1079
  • [9] Customer behavior pattern discovering with web mining
    Zhang, XL
    Gong, WJ
    Kawamura, Y
    [J]. ADVANCED WEB TECHNOLOGIES AND APPLICATIONS, 2004, 3007 : 844 - 853
  • [10] Big Data and the Web Discovering Meaningful Information from Web Data using Data Mining Techniques
    Abd Wahab, Mohd Helmy
    [J]. 2015 4TH INTERNATIONAL CONFERENCE ON RELIABILITY, INFOCOM TECHNOLOGIES AND OPTIMIZATION (ICRITO) (TRENDS AND FUTURE DIRECTIONS), 2015,