Automatic Genre Classification of Web Documents Using Discriminant Analysis for Feature Selection

被引:2
|
作者
Maeda, Akira [1 ]
Hayashi, Yukinori [2 ]
机构
[1] Ritsumeikan Univ, Coll Informat Sci & Engn, Kyoto, Japan
[2] Exa Corp, Burlington, MA USA
关键词
D O I
10.1109/ICADIWT.2009.5273844
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In this paper, we propose a method to classify, Web documents by genre (not by topic) based on features of words and HTML tags. For classification, we use SVM (Support Vector Machine) and Naive Bayes. In order to improve the accuracy of classification, we calculate discriminant efficiencies of each pair of a word and a HTML tag to find out HTML tags which are effective in classification. The experimental results show that our method using discriminant efficiencies achieves 8% increase in classification accuracy
引用
收藏
页码:405 / +
页数:2
相关论文
共 50 条
  • [1] Feature Selection in Automatic Music Genre Classification
    Silla, Carlos N., Jr.
    Koerich, Alessandro L.
    Kaestner, Celso A. A.
    [J]. ISM: 2008 IEEE INTERNATIONAL SYMPOSIUM ON MULTIMEDIA, 2008, : 39 - +
  • [2] Multiple sets of features for automatic genre classification of web documents
    Lim, CS
    Lee, KJ
    Kim, GC
    [J]. INFORMATION PROCESSING & MANAGEMENT, 2005, 41 (05) : 1263 - 1276
  • [3] A FEATURE SELECTION APPROACH FOR AUTOMATIC MUSIC GENRE CLASSIFICATION
    Silla, Carlos N., Jr.
    Koerich, Alessandro L.
    Kaestner, Celso A. A.
    [J]. INTERNATIONAL JOURNAL OF SEMANTIC COMPUTING, 2009, 3 (02) : 183 - 208
  • [4] Feature selection and text classification for Chinese web documents
    Xu, JC
    Liu, DY
    Hu, M
    [J]. PROCEEDINGS OF THE 2004 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-7, 2004, : 1304 - 1309
  • [5] Automatic genre detection of Web documents
    Lim, CS
    Lee, KJ
    Kim, GC
    [J]. NATURAL LANGUAGE PROCESSING - IJCNLP 2004, 2005, 3248 : 310 - 319
  • [6] Variable Global Feature Selection Scheme for automatic classification of text documents
    Agnihotri, Deepak
    Verma, Kesari
    Tripathi, Priyanka
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2017, 81 : 268 - 281
  • [7] Audio Feature Reduction and Analysis for Automatic Music Genre Classification
    Baniya, Babu Kaji
    Lee, Joonwhoan
    Li, Ze-Nian
    [J]. 2014 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN AND CYBERNETICS (SMC), 2014, : 457 - 462
  • [8] A Novel Feature Selection Framework for Automatic Web Page Classification
    Mangai, J. Alamelu
    Kumar, V. Santhosh
    Balamurugan, S. Appavu Alias
    [J]. INTERNATIONAL JOURNAL OF AUTOMATION AND COMPUTING, 2012, 9 (04) : 442 - 448
  • [9] A Novel Feature Selection Framework for Automatic Web Page Classification
    J.Alamelu Mangai
    V.Santhosh Kumar
    S.Appavu alias Balamurugan
    [J]. International Journal of Automation and Computing, 2012, (04) : 442 - 448
  • [10] A Novel Feature Selection Framework for Automatic Web Page Classification
    JAlamelu Mangai
    VSanthosh Kumar
    SAppavu alias Balamurugan
    [J]. International Journal of Automation & Computing . , 2012, (04) - 448