A keyword-based combination approach for detecting phishing webpages

被引:28
|
作者
Ding, Yan [1 ]
Luktarhan, Nurbol [1 ]
Li, Keqin [2 ]
Slamu, Wushour [1 ]
机构
[1] Xinjiang Univ, Coll Informat Sci & Engn, Urumqi, Peoples R China
[2] SUNY Coll New Paltz, Dept Comp Sci, New Paltz, NY USA
基金
中国博士后科学基金;
关键词
Heuristic rule; Machine learning; Phishing; Search engine; URL obfuscation techniques;
D O I
10.1016/j.cose.2019.03.018
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In this paper, the Search & Heuristic Rule & Logistic Regression (SHLR) combination detection method is proposed for detecting the obfuscation techniques commonly used by phishing websites and improving the filtering efficiency of legitimate webpages. The method is composed of three steps. First, the title tag content of the webpage is input as search keywords to the Baidu search engine, and the webpage is considered legal if the webpage domain matches the domain name of any of the top-10 search results; otherwise, further evaluation is performed. Second, if the webpage cannot be identified as legal, then the webpage is further examined to determine whether it is a phishing page based on the heuristic rules defined by the character features. The first two steps can quickly filter webpages to meet the needs of real-time detection. Finally, a logistic regression classifier is used to assess the remaining pages to enhance the adaptability and accuracy of the detection method. The experimental results show that the SHLR can filter 61.9% of legitimate webpages and identify 22.9% of phishing webpages based on uniform/universal resource locator (URL) lexical information. The accuracy of the SHLR is 98.9%; thus, its phishing detection performance is high. (C) 2019 Elsevier Ltd. All rights reserved.
引用
收藏
页码:256 / 275
页数:20
相关论文
共 50 条
  • [31] Experimental Analysis of Keyword-based Social Network Similarity Approach for Document Classification
    Goz, Furkan
    Kabasakal, Osman
    Mutlu, Alev
    2020 28TH SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE (SIU), 2020,
  • [32] A positional keyword-based approach to inferring fine-grained message formats
    Jiang, Jiaojiao
    Versteeg, Steve
    Han, Jun
    Hossain, M. D. Arafat
    Schneider, Jean-Guy
    FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2020, 102 : 369 - 381
  • [33] A Keyword-Based Method for Measuring Sentence Similarity
    Bi, Yuanjun
    Deng, Kai
    Cheng, JinXing
    PROCEEDINGS OF THE 2017 ACM WEB SCIENCE CONFERENCE (WEBSCI '17), 2017, : 379 - 380
  • [34] KEMB: A Keyword-Based XML Message Broker
    Li, Guoliang
    Feng, Jianhua
    Wang, Jianyong
    Zhou, Lizhu
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2011, 23 (07) : 1035 - 1049
  • [35] A Hidden Markov Model Approach to Keyword-Based Search over Relational Databases
    Bergamaschi, Sonia
    Guerra, Francesco
    Rota, Silvia
    Velegrakis, Yannis
    CONCEPTUAL MODELING - ER 2011, 2011, 6998 : 411 - +
  • [36] Empirical analysis on a keyword-based semantic system
    Zi-Ke Zhang
    Linyuan Lü
    Jian-Guo Liu
    Tao Zhou
    The European Physical Journal B, 2008, 66 : 557 - 561
  • [37] A Bayesian network approach to searching Web databases through keyword-based queries
    Calado, P
    da Silva, AS
    Laender, AHF
    Ribeiro-Neto, BA
    Vieira, RC
    INFORMATION PROCESSING & MANAGEMENT, 2004, 40 (05) : 773 - 790
  • [38] An Approach for Transforming Keyword-Based Queries to SPARQL on RDF Data Source Federations
    Cooray, Thilini
    Wikramanayake, Gihan
    2015 FIFTEENTH INTERNATIONAL CONFERENCE ON ADVANCES IN ICT FOR EMERGING REGIONS (ICTER), 2015, : 168 - 175
  • [39] A Keyword-based ICN-IoT Platform
    Ascigil, Onur
    Rene, Sergi
    Xylomenos, George
    Psaras, Ioannis
    Pavlou, George
    PROCEEDINGS OF THE 4TH ACM CONFERENCE ON INFORMATION-CENTRIC NETWORKING (ICN 2017), 2017, : 22 - 28
  • [40] DOCUMENT CLASSIFICATION ON A LEXICAL BASIS (KEYWORD-BASED)
    MARSHAKOVA, IV
    NAUCHNO-TEKHNICHESKAYA INFORMATSIYA SERIYA 2-INFORMATSIONNYE PROTSESSY I SISTEMY, 1974, (05): : 3 - 10