Learning DOM Trees of Web Pages by Subpath Kernel and Detecting Fake e-Commerce Sites

被引:6
|
作者
Shin, Kilho [1 ,5 ]
Ishikawa, Taichi [2 ]
Liu, Yu-Lu [3 ]
Shepard, David Lawrence [4 ]
机构
[1] Gakushuin Univ, Comp Ctr, Tokyo 1718588, Japan
[2] Carnegie Mellon Univ, Informat Networking Inst, Pittsburgh, PA 15213 USA
[3] Rakuten Inc, Cyber Secur Def Dept, Tokyo 1580094, Japan
[4] Evidat Hlth Inc, Data Engn, San Mateo, CA 94402 USA
[5] 1-5-1 Mejiro, Tokyo 1718588, Japan
来源
基金
日本学术振兴会;
关键词
fake site detection; kernel method; web security; EDIT DISTANCE; ALGORITHMS; ALIGNMENT;
D O I
10.3390/make3010006
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The subpath kernel is a class of positive definite kernels defined over trees, which has the following advantages for the purposes of classification, regression and clustering: it can be incorporated into a variety of powerful kernel machines including SVM; It is invariant whether input trees are ordered or unordered; It can be computed by significantly fast linear-time algorithms; And, finally, its excellent learning performance has been proven through intensive experiments in the literature. In this paper, we leverage recent advances in tree kernels to solve real problems. As an example, we apply our method to the problem of detecting fake e-commerce sites. Although the problem is similar to phishing site detection, the fact that mimicking existing authentic sites is harmful for fake e-commerce sites marks a clear difference between these two problems. We focus on fake e-commerce site detection for three reasons: e-commerce fraud is a real problem that companies and law enforcement have been cooperating to solve; Inefficiency hampers existing approaches because datasets tend to be large, while subpath kernel learning overcomes these performance challenges; And we offer increased resiliency against attempts to subvert existing detection methods through incorporating robust features that adversaries cannot change: the DOM-trees of web-sites. Our real-world results are remarkable: our method has exhibited accuracy as high as 0.998 when training SVM with 1000 instances and evaluating accuracy for almost 7000 independent instances. Its generalization efficiency is also excellent: with only 100 training instances, the accuracy score reached 0.996.
引用
收藏
页码:95 / 122
页数:28
相关论文
共 50 条
  • [31] Automated classification of HTML']HTML forms on e-commerce web sites
    Ru, Yanbo
    Horowitz, Ellis
    ONLINE INFORMATION REVIEW, 2007, 31 (04) : 451 - 466
  • [32] The effects of usability and web design attributes on user preference for e-commerce web sites
    Lee, Sangwon
    Koubek, Richard J.
    COMPUTERS IN INDUSTRY, 2010, 61 (04) : 329 - 341
  • [33] Application of Unsupervised Learning in Detecting Behavioral Patterns in E-commerce Customers
    Udayan, J. Divya
    Moneesh, N.
    Vemulapalli, Nehith Sai
    Pruthvi, Paladugula
    Sakhamuri, Rakshith
    PROCEEDINGS OF THE 5TH INTERNATIONAL CONFERENCE ON DATA SCIENCE, MACHINE LEARNING AND APPLICATIONS, VOL 1, ICDSMLA 2023, 2025, 1273 : 1208 - 1217
  • [34] User interface design for the web based E-commerce sites and cultural issues
    Kang, KS
    IC'03: PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON INTERNET COMPUTING, VOLS 1 AND 2, 2003, : 346 - 349
  • [35] Ranking Criteria based on Fuzzy ANP for Assessing E-commerce Web Sites
    Rekik, Rim
    Kallel, Ilhem
    Alimi, Adel M.
    2016 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS (SMC), 2016, : 3469 - 3474
  • [36] RDRP: Reward-Driven Request Prioritization for e-Commerce web sites
    Totok, Alexander
    Karamcheti, Vijay
    ELECTRONIC COMMERCE RESEARCH AND APPLICATIONS, 2010, 9 (06) : 549 - 561
  • [37] Automatic Extraction of Product Information from Multiple e-Commerce Web Sites
    Nasti, Samiah Jan
    Asger, M.
    Butt, Muheet Ahmad
    PROCEEDINGS OF RECENT INNOVATIONS IN COMPUTING, ICRIC 2019, 2020, 597 : 739 - 747
  • [38] Evaluation of Information Extraction Techniques to Label Extracted Data from e-Commerce Web Pages
    Anderson, Neil
    Hong, Jun
    WWW'14 COMPANION: PROCEEDINGS OF THE 23RD INTERNATIONAL CONFERENCE ON WORLD WIDE WEB, 2014, : 1275 - 1278
  • [39] On the social rational mirror:: Learning E-commerce in a Web-served Learning Environment
    da Nóbrega, GM
    Cerri, SA
    Sallantin, J
    INTELLIGENT TUTORING SYSTEMS, 2002, 2363 : 41 - 50
  • [40] Theorizing the application of Transaction-Oriented Web usage Mining on typical e-commerce Web sites
    Tao, Yu-Hui
    Liu, Shu-Chu
    Jang, Min-Da
    Chao, Chian-Hsueng
    JOURNAL OF STATISTICS & MANAGEMENT SYSTEMS, 2013, 16 (4-5): : 257 - 292