Rule identification from Web pages by the XRML approach

被引:19
|
作者
Kang, J
Lee, JK
机构
[1] Ajou Univ, Sch Business Adm, Suwon 443749, South Korea
[2] Korea Adv Inst Sci & Technol, Grad Sch Management, Seoul 130012, South Korea
关键词
rule identification; rule acquisition; knowledge engineering; knowledge acquisition; XRML; RuleML; XML;
D O I
10.1016/j.dss.2005.01.004
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In the world of Web pages, there are oceans of documents in natural language texts and tables. To extract rules from Web pages and maintain consistency between them, we have developed the framework of XRML (eXtensible Rule Markup Language). XRML allows the identification of rules on Web pages and generates the identified rules automatically. For this purpose, we have designed the Rule Identification Markup Language (RIML), which is similar to the formal Rule Structure Mark-tip Language (RSML), both as parts of XRML. RIML 2.0 is designed to identify rules not only from texts, but also from tables on Web pages, and to transform to the formal rules in RSML syntax automatically. While designing RIML 2.0, we considered the features of sharing variables and values, omitted terms, and synonyms. We have conducted an experiment to evaluate the potential benefit of the XRML approach with real world Web pages of Amazon.com, BarnesandNoble.com, and Powells.com. We found that 100.0% of the rules and 99.7% of the rule components could be identified and automatically generated if we do not count the statements for linkages, which generically do not exist on the Web pages. Since the linkage components occupy 11.2% of all components in the rule base, the overall limitation of automatic rule generation is 88.8%. In this setting, 88.5% of the overall rule components could be generated from the identified rules from the Web pages. The result provides solid proof that XRML can facilitate the extraction and maintenance of rules from Web pages while building expert systems in the Semantic Web environment. (c) 2005 Elsevier B.V All rights reserved.
引用
收藏
页码:205 / 227
页数:23
相关论文
共 50 条
  • [21] Automatic identification of informative sections of Web pages
    Debnath, S
    Mitra, P
    Pal, N
    Giles, CL
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2005, 17 (09) : 1233 - 1246
  • [22] Identification of Malicious Web Pages with Static Heuristics
    Seifert, Christian
    Welch, Ian
    Komisarczuk, Peter
    ATNAC: 2008 AUSTRALASIAN TELECOMMUNICATION NETWOKS AND APPLICATIONS CONFERENCE, 2008, : 91 - 96
  • [23] A STRUCTURAL APPROACH TO EXTRACTING CHINESE POSITION RELATIONS FROM WEB PAGES
    Jin, Peiquan
    Yang, Jia
    Zhao, Jie
    Liu, Yanhong
    JOURNAL OF WEB ENGINEERING, 2013, 12 (05): : 363 - 382
  • [24] An Approach to Image Extraction and Accurate Skin Detection from Web Pages
    Girgis, Moheb R.
    Mahmoud, Tarek M.
    Abd-El-Hafeez, Tarek
    PROCEEDINGS OF WORLD ACADEMY OF SCIENCE, ENGINEERING AND TECHNOLOGY, VOL 21, 2007, 21 : 367 - 375
  • [25] A NEW APPROACH TO INCREASE THE INTERACTIVITY OF WEB PAGES
    Sahlean, Gheorghe Bogdan
    Davidescu, Nicolae
    Zaharie, Dorin
    AMIS 2010 - PROCEEDINGS OF THE 5TH INTERNATIONAL CONFERENCE, ACCOUNTING AND MANAGEMENT INFORMATION SYSTEMS, 2010, : 404 - 409
  • [26] A Probabilistic Approach for Distillation and Ranking of Web Pages
    Greco G.
    Greco S.
    Zumpano E.
    World Wide Web, 2001, 4 (3) : 189 - 207
  • [27] An approach to predict the task efficiency of web pages
    Sangita Saha
    Apurbalal Senapati
    Ranjan Maity
    Multimedia Tools and Applications, 2023, 82 : 25217 - 25233
  • [28] An Approach to Validating the Accessibility of Dynamic Web Pages
    Hristov, Hristo
    Stoitsov, Gencho
    Bliznakov, Milen
    Minchev, Stilian
    TEM JOURNAL-TECHNOLOGY EDUCATION MANAGEMENT INFORMATICS, 2023, 12 (01): : 233 - 240
  • [29] Filtering Web Pages by Sensitive Mining Approach
    Sreedevi, M.
    Kaveri, A. Sowmya
    Deepak, V
    Venkatesh, K.
    Sravan, D.
    INTERNATIONAL JOURNAL OF COMPUTER SCIENCE AND NETWORK SECURITY, 2013, 13 (03): : 125 - 128
  • [30] An approach to predict the task efficiency of web pages
    Saha, Sangita
    Senapati, Apurbalal
    Maity, Ranjan
    MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 82 (16) : 25217 - 25233