Learning DOM Trees of Web Pages by Subpath Kernel and Detecting Fake e-Commerce Sites

被引:6
|
作者
Shin, Kilho [1 ,5 ]
Ishikawa, Taichi [2 ]
Liu, Yu-Lu [3 ]
Shepard, David Lawrence [4 ]
机构
[1] Gakushuin Univ, Comp Ctr, Tokyo 1718588, Japan
[2] Carnegie Mellon Univ, Informat Networking Inst, Pittsburgh, PA 15213 USA
[3] Rakuten Inc, Cyber Secur Def Dept, Tokyo 1580094, Japan
[4] Evidat Hlth Inc, Data Engn, San Mateo, CA 94402 USA
[5] 1-5-1 Mejiro, Tokyo 1718588, Japan
来源
基金
日本学术振兴会;
关键词
fake site detection; kernel method; web security; EDIT DISTANCE; ALGORITHMS; ALIGNMENT;
D O I
10.3390/make3010006
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The subpath kernel is a class of positive definite kernels defined over trees, which has the following advantages for the purposes of classification, regression and clustering: it can be incorporated into a variety of powerful kernel machines including SVM; It is invariant whether input trees are ordered or unordered; It can be computed by significantly fast linear-time algorithms; And, finally, its excellent learning performance has been proven through intensive experiments in the literature. In this paper, we leverage recent advances in tree kernels to solve real problems. As an example, we apply our method to the problem of detecting fake e-commerce sites. Although the problem is similar to phishing site detection, the fact that mimicking existing authentic sites is harmful for fake e-commerce sites marks a clear difference between these two problems. We focus on fake e-commerce site detection for three reasons: e-commerce fraud is a real problem that companies and law enforcement have been cooperating to solve; Inefficiency hampers existing approaches because datasets tend to be large, while subpath kernel learning overcomes these performance challenges; And we offer increased resiliency against attempts to subvert existing detection methods through incorporating robust features that adversaries cannot change: the DOM-trees of web-sites. Our real-world results are remarkable: our method has exhibited accuracy as high as 0.998 when training SVM with 1000 instances and evaluating accuracy for almost 7000 independent instances. Its generalization efficiency is also excellent: with only 100 training instances, the accuracy score reached 0.996.
引用
收藏
页码:95 / 122
页数:28
相关论文
共 50 条
  • [21] Consumer Trust in E-Commerce Web Sites: A Meta-Study
    Beatty, Patricia
    Reay, Ian
    Dick, Scott
    Miller, James
    ACM COMPUTING SURVEYS, 2011, 43 (03)
  • [22] Profit-aware overload protection in E-commerce Web sites
    Yue, Chuan
    Wang, Haining
    JOURNAL OF NETWORK AND COMPUTER APPLICATIONS, 2009, 32 (02) : 347 - 356
  • [23] Evaluation of web sites for B2C e-commerce
    Oppenheim, Charles
    Ward, Louise
    ASLIB PROCEEDINGS, 2006, 58 (03): : 237 - 260
  • [24] Catalogue structure of business-to-consumer e-commerce web sites
    Spiteri, L
    CANADIAN JOURNAL OF INFORMATION AND LIBRARY SCIENCE-REVUE CANADIENNE DES SCIENCES DE L INFORMATION ET DE BIBLIOTHECONOMIE, 2000, 25 (04): : 51 - 51
  • [25] Challenging web design and cultural issues in international e-commerce sites
    Kang, KS
    EADOPTION AND THE KNOWLEDGE ECONOMY: ISSUES, APPLICATIONS, CASE STUDIES, PTS 1 AND 2, 2004, 1 : 174 - 179
  • [26] Evolution of e-commerce Web sites: A conceptual framework and a longitudinal study
    Chu, Sung-Chi
    Leung, Lawrence C.
    Hui, Yer Van
    Cheung, Waiman
    INFORMATION & MANAGEMENT, 2007, 44 (02) : 154 - 164
  • [27] Interfacing the System Evaluation Method LSP with E-commerce Web Sites
    Buckley, Greydon
    Dujmovic, Jozo
    COMPUTER SCIENCE AND INFORMATION SYSTEMS, 2008, 5 (01) : 25 - 39
  • [28] Catalogue structure of business-to-consumer E-commerce web sites
    Spiteri, L
    BEYOND THE WEB: TECHNOLOGIES, KNOWLEDGE AND PEOPLE, 2001, : 353 - 369
  • [29] Customer-centered rules for design of e-commerce Web sites
    Fang, XW
    Salvendy, G
    COMMUNICATIONS OF THE ACM, 2003, 46 (12) : 332 - 336
  • [30] Identifying de-facto standards for e-commerce web sites
    Adkisson, HP
    IPCC 2002, REFLECTIONS ON COMMUNICATION, PROCEEDINGS, 2002, : 22 - 45