TEXT CONTENT ANALYSIS FOR ILLICIT WEB PAGES BY USING NEURAL NETWORKS

被引:0
|
作者
Sam, Lee Zhi [1 ]
Maarof, Mohd Aizaini [1 ]
Selamat, Ali [1 ]
Shamsuddin, Siti Mariyam [1 ]
机构
[1] Univ Teknol Malaysia, Fac Comp Sci & Informat Syst FSKSM, Skudai 81310, Johor, Malaysia
来源
JURNAL TEKNOLOGI | 2009年 / 50卷
关键词
Artificial neural network; term weighting scheme; textual content analysis; web pages classification;
D O I
暂无
中图分类号
T [工业技术];
学科分类号
08 ;
摘要
Illicit web contents such as pornography, violence, and gambling have greatly polluted the mind of web users especially children and teenagers. Due to the ineffectiveness of some popular web filtering techniques like Uniform Resource Locator (URL) blocking and Platform for Internet Content Selection (PICS) checking against today's dynamic web contents, content based analysis techniques with effective model are highly desired. In this paper, we have proposed a textual content analysis model using entropy term weighting scheme to classify pornography and sex education web pages. We have examined the entropy scheme with two other common term weighting schemes that are TFIDF and Glasgow. Those techniques have been tested with artificial neural network using small class dataset. In this study, we found that our proposed model has achieved better performance in terms accuracy, convergence speed, and stability compared to the other techniques.
引用
收藏
页数:19
相关论文
共 50 条
  • [21] The anthrax scare and the Web: A content analysis of Web pages linking to resources on anthrax
    Judit Bar-Ilan
    Ana Echermane
    Scientometrics, 2005, 63 : 443 - 462
  • [22] The anthrax scare and the Web: A content analysis of Web pages linking to resources on anthrax
    Bar-Ilan, J
    Echermane, A
    SCIENTOMETRICS, 2005, 63 (03) : 443 - 462
  • [23] Automatic metadata generation for Web pages using a text mining approach
    Yang, HC
    Lee, CH
    INTERNATIONAL WORKSHOP ON CHALLENGES IN WEB INFORMATION RETRIEVAL AND INTEGRATION, PROCEEDINGS, 2005, : 186 - 194
  • [24] Sentiment Analysis of Text using Deep Convolution Neural Networks
    Chachra, Anmol
    Mehndiratta, Pulkit
    Gupta, Mohit
    2017 TENTH INTERNATIONAL CONFERENCE ON CONTEMPORARY COMPUTING (IC3), 2017, : 247 - 252
  • [25] Analysis on the Content Features and Their Correlation of Web Pages for Spam Detection
    Ji Hua
    Zhang Huaxiang
    CHINA COMMUNICATIONS, 2015, 12 (03) : 84 - 94
  • [26] Content Matters: Clustering Web Pages for QoE Analysis With WebCLUST
    Jimenez, Luis Roberto
    Solera, Marta
    Toril, Matias
    Gijon, Carolina
    Casas, Pedro
    IEEE ACCESS, 2021, 9 (09): : 123873 - 123888
  • [27] Video Content Analysis using Convolutional Neural Networks
    Aljarrah, Inad
    Mohammad, Duaa
    2018 9TH INTERNATIONAL CONFERENCE ON INFORMATION AND COMMUNICATION SYSTEMS (ICICS), 2018, : 122 - 126
  • [28] Analysis on the Content Features and Their Correlation of Web Pages for Spam Detection
    JI Hua
    ZHANG Huaxiang
    中国通信, 2015, 12 (03) : 84 - 94
  • [29] Identification of illicit drugs by using SOM neural networks
    Liang, Meiyan
    Shen, Jingling
    Wang, Guangqin
    JOURNAL OF PHYSICS D-APPLIED PHYSICS, 2008, 41 (13)
  • [30] Web pages classification using concept analysis
    Di Lucca, Giuseppe Antonio
    Fasolino, Anna Rita
    Tramontana, Porfirio
    2007 IEEE INTERNATIONAL CONFERENCE ON SOFTWARE MAINTENANCE, 2007, : 435 - +