Learning DOM Trees of Web Pages by Subpath Kernel and Detecting Fake e-Commerce Sites

被引:6
|
作者
Shin, Kilho [1 ,5 ]
Ishikawa, Taichi [2 ]
Liu, Yu-Lu [3 ]
Shepard, David Lawrence [4 ]
机构
[1] Gakushuin Univ, Comp Ctr, Tokyo 1718588, Japan
[2] Carnegie Mellon Univ, Informat Networking Inst, Pittsburgh, PA 15213 USA
[3] Rakuten Inc, Cyber Secur Def Dept, Tokyo 1580094, Japan
[4] Evidat Hlth Inc, Data Engn, San Mateo, CA 94402 USA
[5] 1-5-1 Mejiro, Tokyo 1718588, Japan
来源
基金
日本学术振兴会;
关键词
fake site detection; kernel method; web security; EDIT DISTANCE; ALGORITHMS; ALIGNMENT;
D O I
10.3390/make3010006
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The subpath kernel is a class of positive definite kernels defined over trees, which has the following advantages for the purposes of classification, regression and clustering: it can be incorporated into a variety of powerful kernel machines including SVM; It is invariant whether input trees are ordered or unordered; It can be computed by significantly fast linear-time algorithms; And, finally, its excellent learning performance has been proven through intensive experiments in the literature. In this paper, we leverage recent advances in tree kernels to solve real problems. As an example, we apply our method to the problem of detecting fake e-commerce sites. Although the problem is similar to phishing site detection, the fact that mimicking existing authentic sites is harmful for fake e-commerce sites marks a clear difference between these two problems. We focus on fake e-commerce site detection for three reasons: e-commerce fraud is a real problem that companies and law enforcement have been cooperating to solve; Inefficiency hampers existing approaches because datasets tend to be large, while subpath kernel learning overcomes these performance challenges; And we offer increased resiliency against attempts to subvert existing detection methods through incorporating robust features that adversaries cannot change: the DOM-trees of web-sites. Our real-world results are remarkable: our method has exhibited accuracy as high as 0.998 when training SVM with 1000 instances and evaluating accuracy for almost 7000 independent instances. Its generalization efficiency is also excellent: with only 100 training instances, the accuracy score reached 0.996.
引用
收藏
页码:95 / 122
页数:28
相关论文
共 50 条
  • [1] Automatic configuring Web pages for e-commerce sites
    Silva, ARDC
    de Carvalho, CL
    6TH WORLD MULTICONFERENCE ON SYSTEMICS, CYBERNETICS AND INFORMATICS, VOL X, PROCEEDINGS: MOBILE/WIRELESS COMPUTING AND COMMUNICATION SYSTEMS II, 2002, : 244 - 248
  • [2] The Effect of Learning Feedback Delay on the Growth of E-commerce Web Sites
    Wan, Xiao-ji
    Deng, Gui-shi
    Bai, Yang
    Zhu, Zhi-guo
    ELECTRONIC-BUSINESS INTELLIGENCE: FOR CORPORATE COMPETITIVE ADVANTAGES IN THE AGE OF EMERGING TECHNOLOGIES & GLOBALIZATION, 2010, 14 : 96 - +
  • [3] Competitive dynamics of e-commerce web sites
    Li Yanhui
    Zhu Siming
    APPLIED MATHEMATICAL MODELLING, 2007, 31 (05) : 912 - 919
  • [4] Adaptive delivery of E-commerce web sites
    Gupta, Ashish
    Mathur, Ajay
    Intelligent Data Analysis, 2002, 6 (05) : 469 - 480
  • [5] A framework and methodology for evaluating e-commerce Web sites
    van der Merwe, R
    Bekker, F
    INTERNET RESEARCH, 2003, 13 (05) : 330 - 341
  • [6] Factors affecting the attractiveness of e-commerce web sites
    Cao, M
    Zhang, QY
    6TH WORLD MULTICONFERENCE ON SYSTEMICS, CYBERNETICS AND INFORMATICS, VOL II, PROCEEDINGS: CONCEPTS AND APPLICATIONS OF SYSTEMICS, CYBERNETICS AND INFORMATICS I, 2002, : 30 - 35
  • [7] Content preparation and management for e-commerce Web sites
    Proctor, RW
    Vu, KPL
    Najjar, LJ
    Vaughan, MW
    Salvendy, G
    COMMUNICATIONS OF THE ACM, 2003, 46 (12) : 289 - 299
  • [8] A parallel mapping algorithm for E-Commerce Web pages to semantic concepts
    Yu, Wenfang
    Yi Ouyang
    DCABES 2007 Proceedings, Vols I and II, 2007, : 946 - 950
  • [9] A Comparative Study of Sentiment Analysis Methods for Detecting Fake Reviews in E-Commerce
    Puttarattanamanee M.
    Boongasame L.
    Thammarak K.
    HighTech and Innovation Journal, 2023, 4 (02): : 349 - 363
  • [10] Understanding user interface needs of e-commerce web sites
    Huang, Travis K.
    Fu, Fong-Ling
    BEHAVIOUR & INFORMATION TECHNOLOGY, 2009, 28 (05) : 461 - 469