Mapping documents onto Web page ontology

被引:0
|
作者
Mladenic, D [1 ]
Grobelnik, M
机构
[1] Jozef Stefan Inst, Ljubljana, Slovenia
[2] Carnegie Mellon Univ, Pittsburgh, PA 15213 USA
来源
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The paper describes an approach to automatically mapping Web pages onto ontology using document classification based on the Yahoo! ontology of Web pages. Techniques developed for learning on text data are used here on the hierarchical classification structure (ontology of Web documents). The high number of features is reduced by taking into account the hierarchical structure and using feature subset selection developed for the Naive Bayesian classifier. We focus on data sets with many features that also have a highly unbalanced class distribution. Documents are represented as word-vectors that include word sequences of up to five consecutive words. Based on the hierarchical structure the problem is divided into subproblems, each representing one on the categories included in the Yahoo! hierarchy. The resulting model is a set of independent classifiers, each used to predict the probability that a new document is a member of the corresponding category represented as a node in the hierarchy. Our example problem is automatic document categorization where we want to identify documents relevant for the selected category. Usually, only about 1%-10% of examples belong to the selected category. Experimental evaluation on real-world data shows that the proposed approach gives good results. Our experimental comparison of eleven feature scoring measures show that considering data and algorithm characteristics significantly improves the performance.
引用
收藏
页码:77 / 96
页数:20
相关论文
共 50 条
  • [1] Constructing an effective ontology for web page recommendation
    Singh, Satyaveer
    Aswal, Mahendra Singh
    [J]. International Journal of Web Engineering and Technology, 2021, 16 (02) : 86 - 112
  • [2] A Domain Ontology Learning from Web Documents
    Djaanfar, Ahmed Said
    Frikh, Bouchra
    Ouhbi, Brahim
    [J]. INTELLIGENT DISTRIBUTED COMPUTING IV, 2010, 315 : 201 - +
  • [3] Web Page Classification Using WSD and YAGO and Ontology
    Modi, Sangita S.
    Jagtap, Sudhir B.
    [J]. PROCEEDINGS OF THE 3RD INTERNATIONAL CONFERENCE ON COMMUNICATION AND ELECTRONICS SYSTEMS (ICCES 2018), 2018, : 887 - 891
  • [4] Web page filtering for domain ontology with the context of concept
    Kang, Bo-Yeong
    Kim, Hong-Gee
    [J]. IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2007, E90D (05): : 859 - 862
  • [5] Web page caricatures: Multimedia summaries for WWW documents
    Wynblatt, M
    Benson, D
    [J]. IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA COMPUTING AND SYSTEMS, PROCEEDINGS, 1998, : 194 - 199
  • [6] Personalized Metaheuristic Clustering Onto Web Documents
    Wookey Lee
    [J]. 潍坊学院学报, 2004, (04) : 1 - 4
  • [7] Semantic web complex ontology mapping
    Silva, N
    Rocha, J
    [J]. IEEE/WIC INTERNATIONAL CONFERENCE ON WEB INTELLIGENCE, PROCEEDINGS, 2003, : 82 - 88
  • [8] Ontology Based Data Mining Approach on Web Documents
    Hajiabadi, Hamideh
    [J]. INTERNATIONAL JOURNAL OF COMBINATORIAL OPTIMIZATION PROBLEMS AND INFORMATICS, 2014, 5 (01): : 21 - 25
  • [9] Ontology-based automatic classification of web documents
    Song, MuHee
    Lim, SooYeon
    Kang, DongJin
    Lee, SangJo
    [J]. COMPUTATIONAL INTELLIGENCE, PT 2, PROCEEDINGS, 2006, 4114 : 690 - 700
  • [10] Ontology learning from domain specific web documents
    Central Lab for Agricultural Expert Systems, Agricultural Research Center, Ministry of Agriculture and Land Reclamation, El-Nour St., Giza, Egypt
    不详
    不详
    [J]. Int. J. Metadata Semant. Ontol., 2009, 1-2 (24-33):