Automated Detection of Malevolent Domains in Cyberspace Using Natural Language Processing and Machine Learning

被引:0
|
作者
Samad, Saleem Raja Abdul [1 ]
Ganesan, Pradeepa [1 ]
Al-Kaabi, Amna Salim [1 ]
Rajasekaran, Justin
Singaravelan, M. [2 ]
Basha, Peerbasha Shebbeer [3 ]
机构
[1] Univ Technol & Appl Sci Ibri, Coll Comp & Informat Sci, IT Dept, Shinas, Oman
[2] Vel Tech Rangarajan Dr Sagunthala R&D Inst Sci & T, Dept Comp Sci & Engn, Chennai, Tamil Nadu, India
[3] Jamal Mohamed Coll, Dept Comp Sci, Tiruchirappalli, Tamil Nadu, India
关键词
Machine learning; N-gram; linguistic features; natural language processing (NLP); malicious webpage;
D O I
10.14569/IJACSA.2024.0151036
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Cyberattacks are intentional attacks on computer systems, networks, and devices. Malware, phishing, drive-by downloads, and injection are popular cyberattacks that can harm individuals, businesses, and organizations. Most of these attacks trick internet users by using malicious links or webpages. Malicious webpages can be used to distribute malware, steal personal information, conduct phishing attacks, or perform other malicious activities. Detecting such malicious websites is a tedious task for internet users. Therefore, locating such a website in cyberspace requires an automated detection tool. Currently, machine learning techniques are being used to detect such malicious websites. The majority of recent studies derive limited number of features from webpages (both benign and malicious) and use machine learning (ML) algorithms to detect fraudulent webpages. However, these constrained capabilities might not use the full potential of the dataset. This study addresses this issue by identifying malicious websites using both the URL and webpage content features. To maximize detection accuracy, both ngrams and vectorization methods in natural language processing are adopted with minimum feature-set. To exploit the full potential of the dataset, the proposed approach derives the 22 common linguistic features of the URL and generates ngrams from the domain name of the URL. The textual content of the webpages was also used. The research employs seven machine learning algorithms with three vectorization methods. The outcome reveals that the proposed method outperformed the results of previous studies.
引用
收藏
页码:328 / 341
页数:14
相关论文
共 50 条
  • [41] Fake News Detection Using Deep Learning and Natural Language Processing
    Matheven, Anand
    Venkata, Burra
    Kumar, Durga
    2022 9TH INTERNATIONAL CONFERENCE ON SOFT COMPUTING & MACHINE INTELLIGENCE, ISCMI, 2022, : 11 - 14
  • [42] Security Vulnerability Detection Using Deep Learning Natural Language Processing
    Ziems, Noah
    Wu, Shaoen
    IEEE CONFERENCE ON COMPUTER COMMUNICATIONS WORKSHOPS (IEEE INFOCOM WKSHPS 2021), 2021,
  • [43] Machine learning for natural language processing (and vice versa?)
    Cardie, C
    MACHINE LEARNING: ECML 2005, PROCEEDINGS, 2005, 3720 : 2 - 2
  • [44] Special Issue on Machine Learning and Natural Language Processing
    Mozgovoy, Maxim
    Montero, Calkin Suero
    APPLIED SCIENCES-BASEL, 2022, 12 (17):
  • [45] Machine translation using natural language processing
    Rishita, Middi Venkata Sai
    Raju, Middi Appala
    Harris, Tanvir Ahmed
    2018 INTERNATIONAL JOINT CONFERENCE ON METALLURGICAL AND MATERIALS ENGINEERING (JCMME 2018), 2019, 277
  • [46] Quantum machine learning for natural language processing application
    Pandey, Shyambabu
    Basisth, Nihar Jyoti
    Sachan, Tushar
    Kumari, Neha
    Pakray, Partha
    PHYSICA A-STATISTICAL MECHANICS AND ITS APPLICATIONS, 2023, 627
  • [47] Tutorial: Machine learning methods in natural language processing
    Collins, M
    LEARNING THEORY AND KERNEL MACHINES, 2003, 2777 : 655 - 655
  • [48] Machine learning for natural language processing (and vice versa?)
    Cardie, C
    KNOWLEDGE DISCOVERY IN DATABASES: PKDD 2005, 2005, 3721 : 2 - 2
  • [49] Machine learning for efficient natural-language processing
    Pereira, F
    COMBINATORIAL PATTERN MATCHING, 2000, 1848 : 11 - 11
  • [50] Nursing innovations in machine learning: Using Natural Language Processing in Falls Prediction
    Solberg, L. M.
    Ingibjargardottir, R.
    Wu, Y.
    Lucero, R.
    JOURNAL OF THE AMERICAN GERIATRICS SOCIETY, 2020, 68 : S48 - S49