Rule identification from Web pages by the XRML approach

被引:19
|
作者
Kang, J
Lee, JK
机构
[1] Ajou Univ, Sch Business Adm, Suwon 443749, South Korea
[2] Korea Adv Inst Sci & Technol, Grad Sch Management, Seoul 130012, South Korea
关键词
rule identification; rule acquisition; knowledge engineering; knowledge acquisition; XRML; RuleML; XML;
D O I
10.1016/j.dss.2005.01.004
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In the world of Web pages, there are oceans of documents in natural language texts and tables. To extract rules from Web pages and maintain consistency between them, we have developed the framework of XRML (eXtensible Rule Markup Language). XRML allows the identification of rules on Web pages and generates the identified rules automatically. For this purpose, we have designed the Rule Identification Markup Language (RIML), which is similar to the formal Rule Structure Mark-tip Language (RSML), both as parts of XRML. RIML 2.0 is designed to identify rules not only from texts, but also from tables on Web pages, and to transform to the formal rules in RSML syntax automatically. While designing RIML 2.0, we considered the features of sharing variables and values, omitted terms, and synonyms. We have conducted an experiment to evaluate the potential benefit of the XRML approach with real world Web pages of Amazon.com, BarnesandNoble.com, and Powells.com. We found that 100.0% of the rules and 99.7% of the rule components could be identified and automatically generated if we do not count the statements for linkages, which generically do not exist on the Web pages. Since the linkage components occupy 11.2% of all components in the rule base, the overall limitation of automatic rule generation is 88.8%. In this setting, 88.5% of the overall rule components could be generated from the identified rules from the Web pages. The result provides solid proof that XRML can facilitate the extraction and maintenance of rules from Web pages while building expert systems in the Semantic Web environment. (c) 2005 Elsevier B.V All rights reserved.
引用
下载
收藏
页码:205 / 227
页数:23
相关论文
共 50 条
  • [41] A Collaborative Approach to Building Evaluated Web Pages Datasets
    Barros, Ricardo
    Rodrigues Nt, Jose A.
    Carneiro Filho, Heraldo J. A.
    Ferreira, Fabricio R. S.
    Fernandes, Oliverio C.
    Silva, Carlos Eduardo P.
    Ribeiro, Andre L. G.
    Xexeo, Geraldo B.
    de Souza, Jano M.
    2009 13TH INTERNATIONAL CONFERENCE ON COMPUTER SUPPORTED COOPERATIVE WORK IN DESIGN, 2009, : 668 - 673
  • [42] A reverse engineering approach for automatic annotation of Web pages
    De Virgilio, Roberto
    Frasincar, Flavius
    Hop, Walter
    Lachner, Stephan
    MULTIMEDIA TOOLS AND APPLICATIONS, 2013, 64 (01) : 119 - 140
  • [43] A reverse engineering approach for automatic annotation of Web pages
    Roberto De Virgilio
    Flavius Frasincar
    Walter Hop
    Stephan Lachner
    Multimedia Tools and Applications, 2013, 64 : 119 - 140
  • [44] Classification of Web Pages on Attractiveness: A Supervised Learning approach
    Khade, Ganesh
    Kumar, Sudhakar
    Bhattacharya, Samit
    4TH INTERNATIONAL CONFERENCE ON INTELLIGENT HUMAN COMPUTER INTERACTION (IHCI 2012), 2012,
  • [45] EVILSEED: A Guided Approach to Finding Malicious Web Pages
    Invernizzi, Luca
    Comparetti, Paolo Milani
    Benvenuti, Stefano
    Kruegel, Christopher
    Cova, Marco
    Vigna, Giovanni
    2012 IEEE SYMPOSIUM ON SECURITY AND PRIVACY (SP), 2012, : 428 - 442
  • [46] Performance Comparison Study of Language Identification Tools for Identification of Farsi Web Pages
    Kordestanchi, Hamed
    Naderi, Hassan
    2013 5TH CONFERENCE ON INFORMATION AND KNOWLEDGE TECHNOLOGY (IKT), 2013, : 489 - 494
  • [47] Analysis of Duplicated Web Pages Identification Methods in Search Engine
    Duan, Fei
    Zheng, Yan
    2010 2ND INTERNATIONAL WORKSHOP ON DATABASE TECHNOLOGY AND APPLICATIONS PROCEEDINGS (DBTA), 2010,
  • [48] Comparing clustering algorithms for the identification of similar pages in web applications
    De Lucia, Andrea
    Risi, Michele
    Scanniello, Giuseppe
    Tortora, Genoveffa
    WEB ENGINEERING, PROCEEDINGS, 2007, 4607 : 415 - +
  • [49] Discovering aspects of Web pages from their referential contexts in the Web
    Zettsu, K
    Kidawara, Y
    Tanaka, K
    DATABASE SYSTEMS FOR ADVANCED APPLICATIONS, 2004, 2973 : 618 - 629
  • [50] A Novel Approach for Extraction and Representation of Main Data from Web Pages to Android Application
    Veeraiah, D.
    Ramanjaneyulu, Y. V.
    Yakobu, D.
    Sahithi, T.
    2016 IEEE INTERNATIONAL CONFERENCE ON RECENT TRENDS IN ELECTRONICS, INFORMATION & COMMUNICATION TECHNOLOGY (RTEICT), 2016, : 1126 - 1130