Website Classification Using Word Based Multiple N-Gram Models And Random Search Oriented Feature Parameters

被引:0
|
作者
Shawon, Ashadullah [1 ]
Zuhori, Syed Tauhid [1 ]
Mahmud, Firoz [1 ]
Rahman, Md Jamil-Ur [1 ]
机构
[1] Rajshahi Univ Engn & Technol, Dept Comp Sci & Engn, Rajshahi, Bangladesh
关键词
Multiple N-gram Models; Random Search; URL Classification; Website Classification; Multinomial Naive Bayes; Web Mining;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Website classification is a convenient starting point for building an intelligent web browser and social networking sites that can understand the favorite categories of a user and also detect adult or harmful websites perfectly. Classifying the websites using the information of the Uniform Resource Locator (URL) is an important and fast technique. A perfect result is needed for URL classification to make it usable in the real world applications. So we have proposed an improved approach for URL classification that is able to provide a better result. We have introduced the word-based multiple n-gram models for efficient feature extraction and multinomial distribution for Na ve Bayes classifier under the Random Search pipeline for hyperparameter optimization that finds the best parameters of the URL features. The experimental result of our research is compared with the result of previous research works and we have shown a better result than the existing result. Our experimental result provides 88.77% in recall and 87.63% in F1-Score which is the best performance so far.
引用
收藏
页数:6
相关论文
共 40 条
  • [1] Word N-gram Based Classification for Data Leakage Prevention
    Alneyadi, Sultan
    Sithirasenan, Elankayer
    Muthukkumarasamy, Vallipuram
    [J]. 2013 12TH IEEE INTERNATIONAL CONFERENCE ON TRUST, SECURITY AND PRIVACY IN COMPUTING AND COMMUNICATIONS (TRUSTCOM 2013), 2013, : 578 - 585
  • [2] Document classification using n-gram and word semantic similarity
    Ren, Mei-Ying
    Kang, Sinjae
    [J]. International Journal of Future Generation Communication and Networking, 2015, 8 (08): : 111 - 118
  • [3] Short Text Classification Based on Feature Extension Using The N-Gram Model
    Zhang, Xinwei
    Wu, Bin
    [J]. 2015 12TH INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS AND KNOWLEDGE DISCOVERY (FSKD), 2015, : 710 - 716
  • [4] Partitioning Based N-Gram Feature Selection for Malware Classification
    Hu, Weiwei
    Tan, Ying
    [J]. DATA MINING AND BIG DATA, DMBD 2016, 2016, 9714 : 187 - 195
  • [5] Chinese new word identification using N-gram and PPM Models
    Li, Dun
    Tu, Wei
    Shi, Lei
    [J]. EMERGING SYSTEMS FOR MATERIALS, MECHANICS AND MANUFACTURING, 2012, 109 : 612 - 616
  • [6] The Optimization of n-Gram Feature Extraction Based on Term Occurrence for Cyberbullying Classification
    Setiawan, Yudi
    Maulidevi, Nur Ulfa
    Surendro, Kridanto
    [J]. Data Science Journal, 2024, 23 (01)
  • [7] Combination of syllable based N-gram search and word search for spoken term detection through spoken queries and IV/OOV classification
    Toyohashi University of Technology, Japan
    [J]. IEEE Workshop Autom. Speech Recognit. Underst., ASRU - Proc., 2015, (200-206):
  • [8] COMBINATION OF SYLLABLE BASED N-GRAM SEARCH AND WORD SEARCH FOR SPOKEN TERM DETECTION THROUGH SPOKEN QUERIES AND IV/OOV CLASSIFICATION
    Sakamoto, Nagisa
    Yamamoto, Kazumasa
    Nakagawa, Seiichi
    [J]. 2015 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING (ASRU), 2015, : 200 - 206
  • [9] Protein Classification using Modified N-gram and Skip-gram Models Extended Abstract
    Islam, S. M. Ashiqul
    Kearney, Christopher Michel
    Choudhury, Ankan
    Baker, Erich J.
    [J]. ACM-BCB' 2017: PROCEEDINGS OF THE 8TH ACM INTERNATIONAL CONFERENCE ON BIOINFORMATICS, COMPUTATIONAL BIOLOGY,AND HEALTH INFORMATICS, 2017, : 586 - 586
  • [10] URL-Based Web Page Classification: With n-Gram Language Models
    Abdallah, Tarek Amr
    de La Iglesia, Beatriz
    [J]. KNOWLEDGE DISCOVERY, KNOWLEDGE ENGINEERING AND KNOWLEDGE MANAGEMENT, IC3K 2014, 2015, 553 : 19 - 33