Improve feature selection method of web page language identification using fuzzy ARTMAP

被引:2
|
作者
Ng C.-C. [1 ]
Selamat A. [1 ]
机构
[1] Faculty of Computer Science and Information Systems, University of Technology Malaysia (UTM), Johor Bahru, Johor
关键词
Feature selection; Fuzzy ARTMAP; N-grams frequency; Web page language identification;
D O I
10.1504/IJIIDS.2010.036897
中图分类号
学科分类号
摘要
The information available in languages other than English on the World Wide Web and global information systems is increasing significantly. Different languages can be produced by using one particular script such as Arabic, Persian, Urdu and Pashto that use Arabic script letters. The issue is how to produce reliable features of a web page that is to undergo language identification. Incorrectly identifying the language results in garbled translations as well as faulty and incomplete analyses. The aim of this study is to enhance the effectiveness of feature selection method of web page language identification. We have investigated total N-grams, N-grams frequency, N-grams frequency document frequency, and N-grams frequency inverse document frequency of web page language identification. From the experimental results, it is proven that N-grams frequency gives the most promising result compared to other feature selection methods. Copyright © 2010 Inderscience Enterprises Ltd.
引用
收藏
页码:629 / 642
页数:13
相关论文
共 50 条
  • [1] Application of feature selection and fuzzy ARTMAP to intrusion detection
    Vilakazi, Christina B.
    Marwala, Tshilidzi
    [J]. 2006 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS, VOLS 1-6, PROCEEDINGS, 2006, : 4880 - +
  • [2] Web page feature selection and classification using neural networks
    Selamat, A
    Omatu, S
    [J]. INFORMATION SCIENCES, 2004, 158 : 69 - 88
  • [3] ARABIC SCRIPT WEB PAGE LANGUAGE IDENTIFICATION USING HYBRID-KNN METHOD
    Selamat, Ali
    Subroto, Imam Much Ibnu
    Ng, Choon-Ching
    [J]. INTERNATIONAL JOURNAL OF COMPUTATIONAL INTELLIGENCE AND APPLICATIONS, 2009, 8 (03) : 315 - 343
  • [4] Study of feature selection in the web page classification
    [J]. 2000, Shanghai Comp Soc, China (26):
  • [5] Improving Language Identification of Web Page Using Optimum Profile
    Ng, Choon-Ching
    Selamat, Ali
    [J]. SOFTWARE ENGINEERING AND COMPUTER SYSTEMS, PT 2, 2011, 180 : 157 - +
  • [6] Feature subset selection using genetic algorithms for Web page categorization
    Ying, XM
    Liu, M
    Dou, WH
    [J]. COMPUTER SCIENCE AND TECHNOLOGY IN NEW CENTURY, 2001, : 548 - 552
  • [7] Speech emotion recognition using FCBF feature selection method and GA-optimized fuzzy ARTMAP neural network
    Davood Gharavian
    Mansour Sheikhan
    Alireza Nazerieh
    Sahar Garoucy
    [J]. Neural Computing and Applications, 2012, 21 : 2115 - 2126
  • [8] Speech emotion recognition using FCBF feature selection method and GA-optimized fuzzy ARTMAP neural network
    Gharavian, Davood
    Sheikhan, Mansour
    Nazerieh, Alireza
    Garoucy, Sahar
    [J]. NEURAL COMPUTING & APPLICATIONS, 2012, 21 (08): : 2115 - 2126
  • [9] Feature Subset Selection Using a Fuzzy Method
    Cintra, Marcos Evandro
    Martin, Trevor P.
    Monard, Maria Carolina
    Camargo, Heloisa de Arruda
    [J]. 2009 INTERNATIONAL CONFERENCE ON INTELLIGENT HUMAN-MACHINE SYSTEMS AND CYBERNETICS, VOL 2, PROCEEDINGS, 2009, : 214 - +
  • [10] Feature selection with rough sets for web page classification
    An, AJ
    Huang, YH
    Huang, XJ
    Cercone, N
    [J]. TRANSACTIONS ON ROUGH SETS II: ROUGH SETS AND FUZZY SETS, 2004, 3135 : 1 - 13