Learning DOM Trees of Web Pages by Subpath Kernel and Detecting Fake e-Commerce Sites

被引:6
|
作者
Shin, Kilho [1 ,5 ]
Ishikawa, Taichi [2 ]
Liu, Yu-Lu [3 ]
Shepard, David Lawrence [4 ]
机构
[1] Gakushuin Univ, Comp Ctr, Tokyo 1718588, Japan
[2] Carnegie Mellon Univ, Informat Networking Inst, Pittsburgh, PA 15213 USA
[3] Rakuten Inc, Cyber Secur Def Dept, Tokyo 1580094, Japan
[4] Evidat Hlth Inc, Data Engn, San Mateo, CA 94402 USA
[5] 1-5-1 Mejiro, Tokyo 1718588, Japan
来源
基金
日本学术振兴会;
关键词
fake site detection; kernel method; web security; EDIT DISTANCE; ALGORITHMS; ALIGNMENT;
D O I
10.3390/make3010006
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The subpath kernel is a class of positive definite kernels defined over trees, which has the following advantages for the purposes of classification, regression and clustering: it can be incorporated into a variety of powerful kernel machines including SVM; It is invariant whether input trees are ordered or unordered; It can be computed by significantly fast linear-time algorithms; And, finally, its excellent learning performance has been proven through intensive experiments in the literature. In this paper, we leverage recent advances in tree kernels to solve real problems. As an example, we apply our method to the problem of detecting fake e-commerce sites. Although the problem is similar to phishing site detection, the fact that mimicking existing authentic sites is harmful for fake e-commerce sites marks a clear difference between these two problems. We focus on fake e-commerce site detection for three reasons: e-commerce fraud is a real problem that companies and law enforcement have been cooperating to solve; Inefficiency hampers existing approaches because datasets tend to be large, while subpath kernel learning overcomes these performance challenges; And we offer increased resiliency against attempts to subvert existing detection methods through incorporating robust features that adversaries cannot change: the DOM-trees of web-sites. Our real-world results are remarkable: our method has exhibited accuracy as high as 0.998 when training SVM with 1000 instances and evaluating accuracy for almost 7000 independent instances. Its generalization efficiency is also excellent: with only 100 training instances, the accuracy score reached 0.996.
引用
收藏
页码:95 / 122
页数:28
相关论文
共 50 条
  • [41] E-Commerce Design by Older Adults: The Selection and Placement of Web Objects on Shopping Sites
    Osman, Rozianawaty
    Hwang, Faustina
    FRONTIERS IN COMPUTER SCIENCE, 2021, 3
  • [42] Implications of search engine spam on the visibility of South African e-commerce Web sites
    Mbikiwa, F.
    Weideman, M.
    SOUTH AFRICAN JOURNAL OF INFORMATION MANAGEMENT, 2006, 8 (04):
  • [43] Profit-aware admission control for overload protection in e-commerce web sites
    Yue, Chuan
    Wang, Haining
    2007 FIFTEENTH IEEE INTERNATIONAL WORKSHOP ON QUALITY OF SERVICE, 2007, : 188 - +
  • [44] An AHP-IFT Integrated Model for Performance Evaluation of E-Commerce Web Sites
    Babak Daneshvar Rouyendegh
    Kazim Topuz
    Ali Dag
    Asil Oztekin
    Information Systems Frontiers, 2019, 21 : 1345 - 1355
  • [45] UNDERSTANDING AESTHETICS DESIGN FOR E-COMMERCE WEB SITES: A COGNITIVE-AFFECTIVE FRAMEWORK
    Cai Shun
    Xu Yunjie
    Yu Jie
    De Souza, Robert
    12TH PACIFIC ASIA CONFERENCE ON INFORMATION SYSTEMS (PACIS 2008), 2008, : 579 - +
  • [46] P3P adoption on E-commerce web sites - A survey and analysis
    Beatty, Patricia
    Reay, Ian
    Dick, Scott
    Miller, James
    IEEE INTERNET COMPUTING, 2007, 11 (02) : 65 - 71
  • [47] A Framework for Assessing Payment Security Mechanisms and Security Information on e-Commerce Web Sites
    Ally, Mustafa
    Toleman, Mark
    PACIFIC ASIA CONFERENCE ON INFORMATION SYSTEMS 2005, SECTIONS 1-8 AND POSTER SESSIONS 1-6, 2005, : 1216 - 1231
  • [48] An AHP-IFT Integrated Model for Performance Evaluation of E-Commerce Web Sites
    Rouyendegh, Babak Daneshvar
    Topuz, Kazim
    Dag, Ali
    Oztekin, Asil
    INFORMATION SYSTEMS FRONTIERS, 2019, 21 (06) : 1345 - 1355
  • [49] Voice of evidence - How do we build trust into e-commerce Web sites?
    Ofuonye, Ejike
    Beatty, Patricia
    Reay, Ian
    Dick, Scott
    Miller, James
    IEEE SOFTWARE, 2008, 25 (05) : 7 - 9
  • [50] Using a Fuzzy Classification Approach to Assess E-Commerce Web Sites: An Empirical Investigation
    Zhou, Duanning
    Huang, Wayne Wei
    ACM TRANSACTIONS ON INTERNET TECHNOLOGY, 2009, 9 (03)