Rule identification from Web pages by the XRML approach

被引:19
|
作者
Kang, J
Lee, JK
机构
[1] Ajou Univ, Sch Business Adm, Suwon 443749, South Korea
[2] Korea Adv Inst Sci & Technol, Grad Sch Management, Seoul 130012, South Korea
关键词
rule identification; rule acquisition; knowledge engineering; knowledge acquisition; XRML; RuleML; XML;
D O I
10.1016/j.dss.2005.01.004
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In the world of Web pages, there are oceans of documents in natural language texts and tables. To extract rules from Web pages and maintain consistency between them, we have developed the framework of XRML (eXtensible Rule Markup Language). XRML allows the identification of rules on Web pages and generates the identified rules automatically. For this purpose, we have designed the Rule Identification Markup Language (RIML), which is similar to the formal Rule Structure Mark-tip Language (RSML), both as parts of XRML. RIML 2.0 is designed to identify rules not only from texts, but also from tables on Web pages, and to transform to the formal rules in RSML syntax automatically. While designing RIML 2.0, we considered the features of sharing variables and values, omitted terms, and synonyms. We have conducted an experiment to evaluate the potential benefit of the XRML approach with real world Web pages of Amazon.com, BarnesandNoble.com, and Powells.com. We found that 100.0% of the rules and 99.7% of the rule components could be identified and automatically generated if we do not count the statements for linkages, which generically do not exist on the Web pages. Since the linkage components occupy 11.2% of all components in the rule base, the overall limitation of automatic rule generation is 88.8%. In this setting, 88.5% of the overall rule components could be generated from the identified rules from the Web pages. The result provides solid proof that XRML can facilitate the extraction and maintenance of rules from Web pages while building expert systems in the Semantic Web environment. (c) 2005 Elsevier B.V All rights reserved.
引用
收藏
页码:205 / 227
页数:23
相关论文
共 50 条
  • [1] Extraction of structured rules from Web pages and maintenance of mutual consistency: XRML approach
    Kang, JY
    Lee, JK
    [J]. RULES AND RULE MARKUP LANGUAGES FOR THE SEMANTIC WEB, 2003, 2876 : 150 - 163
  • [2] Rule identification using ontology while acquiring rules from Web pages
    Park, Sangun
    Lee, Jae Kyu
    [J]. INTERNATIONAL JOURNAL OF HUMAN-COMPUTER STUDIES, 2007, 65 (07) : 659 - 673
  • [3] A System's Approach Towards Domain Identification of Web Pages
    Gupta, Sonali
    Bhatia, Komal Kumar
    [J]. 2012 2ND IEEE INTERNATIONAL CONFERENCE ON PARALLEL, DISTRIBUTED AND GRID COMPUTING (PDGC), 2012, : 870 - 875
  • [4] A Novel Approach for Content Extraction from Web Pages
    Bhardwaj, Aanshi
    Mangat, Veenu
    [J]. 2014 RECENT ADVANCES IN ENGINEERING AND COMPUTATIONAL SCIENCES (RAECS), 2014,
  • [5] An Approach for Restructuring of Web Pages
    Prasanna, Chennupati. R.
    Kishore, M. Venkata
    Rao, P. Srinivasa
    Sandeep, L. Mohana
    Lakshmi, D. Rajya
    [J]. INTERNATIONAL JOURNAL OF COMPUTER SCIENCE AND NETWORK SECURITY, 2010, 10 (02): : 102 - 104
  • [6] Enhance Web Pages Genre Identification Using Neighboring Pages
    Zhu, Jia
    Zhou, Xiaofang
    Fung, Gabriel
    [J]. WEB INFORMATION SYSTEMS ENGINEERING - WISE 2011, 2011, 6997 : 282 - +
  • [7] XrML-RBLicensing approach adapted to the BPEL process of composite web services
    Tout, Hanine
    Mourad, Azzam
    Otrok, Hadi
    [J]. SERVICE ORIENTED COMPUTING AND APPLICATIONS, 2013, 7 (03) : 217 - 230
  • [8] Rule-based structural analysis of Web pages
    Vitali, F
    Di Iorio, A
    Campori, EV
    [J]. DOCUMENT ANALYSIS SYSTEMS VI, PROCEEDINGS, 2004, 3163 : 425 - 437
  • [9] Extraction of web news from web pages using a ternary tree approach
    Laishram, Debina
    Sebastian, Merin
    [J]. 2015 SECOND INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTING AND COMMUNICATION ENGINEERING ICACCE 2015, 2015, : 628 - 633
  • [10] A hybrid approach for extracting informative content from web pages
    Uzun, Erdinc
    Agun, Hayri Volkan
    Yerlikaya, Tarik
    [J]. INFORMATION PROCESSING & MANAGEMENT, 2013, 49 (04) : 928 - 944