Rule identification from Web pages by the XRML approach

被引:19
|
作者
Kang, J
Lee, JK
机构
[1] Ajou Univ, Sch Business Adm, Suwon 443749, South Korea
[2] Korea Adv Inst Sci & Technol, Grad Sch Management, Seoul 130012, South Korea
关键词
rule identification; rule acquisition; knowledge engineering; knowledge acquisition; XRML; RuleML; XML;
D O I
10.1016/j.dss.2005.01.004
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In the world of Web pages, there are oceans of documents in natural language texts and tables. To extract rules from Web pages and maintain consistency between them, we have developed the framework of XRML (eXtensible Rule Markup Language). XRML allows the identification of rules on Web pages and generates the identified rules automatically. For this purpose, we have designed the Rule Identification Markup Language (RIML), which is similar to the formal Rule Structure Mark-tip Language (RSML), both as parts of XRML. RIML 2.0 is designed to identify rules not only from texts, but also from tables on Web pages, and to transform to the formal rules in RSML syntax automatically. While designing RIML 2.0, we considered the features of sharing variables and values, omitted terms, and synonyms. We have conducted an experiment to evaluate the potential benefit of the XRML approach with real world Web pages of Amazon.com, BarnesandNoble.com, and Powells.com. We found that 100.0% of the rules and 99.7% of the rule components could be identified and automatically generated if we do not count the statements for linkages, which generically do not exist on the Web pages. Since the linkage components occupy 11.2% of all components in the rule base, the overall limitation of automatic rule generation is 88.8%. In this setting, 88.5% of the overall rule components could be generated from the identified rules from the Web pages. The result provides solid proof that XRML can facilitate the extraction and maintenance of rules from Web pages while building expert systems in the Semantic Web environment. (c) 2005 Elsevier B.V All rights reserved.
引用
收藏
页码:205 / 227
页数:23
相关论文
共 50 条
  • [31] A probabilistic approach for discovering authoritative Web pages
    Greco, G
    Greco, S
    Zumpano, E
    [J]. SECOND INTERNATIONAL CONFERENCE ON WEB INFORMATION SYSTEMS ENGINEERING, VOL I, PROCEEDINGS, 2002, : 133 - 133
  • [32] Fuzzy classification of web pages based on the transposition-learning rule
    Zhang, Maoyuan
    Zou, Chunyan
    Lu, Zhengding
    [J]. Jisuanji Yanjiu yu Fazhan/Computer Research and Development, 2007, 44 (01): : 99 - 104
  • [33] AN INVESTIGATION OF CLUSTERING ALGORITHMS IN THE IDENTIFICATION OF SIMILAR WEB PAGES
    De Lucia, Andrea
    Risi, Michele
    Scanniello, Giuseppe
    Tortora, Genoveffa
    [J]. JOURNAL OF WEB ENGINEERING, 2009, 8 (04): : 346 - 370
  • [34] Automatic Identification of Temporal Information in Tourism Web Pages
    Weiser, Stephanie
    Laublet, Philippe
    Minel, Jean-Luc
    [J]. SIXTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, LREC 2008, 2008, : 127 - 131
  • [35] Web site improvements based on representative pages identification
    Ríos, SA
    Velásquez, JD
    Yasuda, H
    Aoki, T
    [J]. AI 2005: ADVANCES IN ARTIFICIAL INTELLIGENCE, 2005, 3809 : 1162 - 1166
  • [36] To Extract Informative Content from online web pages by using Hybrid Approach
    Kaddu, Madhura R.
    Kulkarni, R. B.
    [J]. 2016 INTERNATIONAL CONFERENCE ON ELECTRICAL, ELECTRONICS, AND OPTIMIZATION TECHNIQUES (ICEEOT), 2016, : 972 - 977
  • [37] Discovering image semantics from web pages using a text mining approach
    Yang, HC
    Lee, CH
    [J]. ADVANCES IN WEB-AGE INFORMATION MANAGEMENT, PROCEEDINGS, 2003, 2762 : 495 - 502
  • [38] Extracting Templates from Web pages
    Manjula, R.
    Chilambuchelvan, A.
    [J]. 2013 INTERNATIONAL CONFERENCE ON GREEN COMPUTING, COMMUNICATION AND CONSERVATION OF ENERGY (ICGCE), 2013, : 788 - 791
  • [39] Information Extraction from Web pages
    Novotny, Robert
    Vojtas, Peter
    Maruscak, Dusan
    [J]. 2009 IEEE/WIC/ACM INTERNATIONAL JOINT CONFERENCES ON WEB INTELLIGENCE (WI) AND INTELLIGENT AGENT TECHNOLOGIES (IAT), VOL 3, 2009, : 121 - +
  • [40] ENiD: An Encrypted Web Pages Traffic Identification Based on Web Visiting Behavior
    Ge, Mengmeng
    Yu, Xiangzhan
    Sachidananda, Vinay Mysore
    Liu, Shangqing
    Liu, Likun
    [J]. 2022 IEEE INTERNATIONAL CONFERENCE ON DATA MINING WORKSHOPS, ICDMW, 2022, : 593 - 601