Wikipedia HTML']HTML Structure Analysis for Ontology Construction

被引：1

作者：

Zarrad, Rim ^{[1
]}

Doggaz, Narjes ^{[2
]}

Zagrouba, Ezzedine ^{[3
]}

机构：

[1] Univ Manouba, Higher Inst Documentat, Lab LIMTIC, Ariana, Tunisia

[2] Univ Tunis El Manar, Fac Sci Tunisia, Lab LIPAH, Tunis, Tunisia

[3] Univ Tunis El Manar, Higher Inst Comp Sci, Lab LIMTIC, Tunis, Tunisia

来源：

KNOWLEDGE ORGANIZATION | 2018年 / 45卷 / 02期

关键词：

taxonomic relations; concepts; extracted semantic relations; Wikipedia; ontology construction;

D O I：

10.5771/0943-7444-2018-2-108

中图分类号：

G25 [图书馆学、图书馆事业]; G35 [情报学、情报工作];

学科分类号：

1205 ; 120501 ;

摘要：

Previously, the main problem of information extraction was to gather enough data. Today, the challenge is not to collect data but to interpret and represent them in order to deduce information. Ontologies are considered suitable solutions for organizing information. The classic methods for ontology construction from textual documents rely on natural language analysis and are generally based on statistical or linguistic approaches. However, these approaches do not consider the document structure which provides additional knowledge. In fact, the structural organization of documents also conveys meaning. In this context, new approaches focus on document structure analysis to extract knowledge. This paper describes a methodology for ontology construction from web data and especially from Wikipedia articles. It focuses mainly on document structure in order to extract the main concepts and their relations. The proposed methods extract not only taxonomic and non-taxonomic relations but also give the labels describing non-taxonomic relations. The extraction of non-taxonomic relations is established by analyzing the titles hierarchy in each document. A pattern matching is also applied in order to extract known semantic relations. We propose also to apply a refinement to the extracted relations in order to keep only those that are relevant. The refinement process is performed by applying the transitive property, checking the nature of the relations and analyzing taxonomic relations having inverted arguments. Experiments have been performed on French Wikipedia articles related to the medical field. Ontology evaluation is performed by comparing it to gold standards.

引用

页码：108 / 124

页数：17

共 50 条

[1] Ontology-based HTML']HTML to XML conversion
Li, SJ
Ou, WJ
Yu, JQ
ADVANCES IN WEB-AGE INFORMATION MANAGEMENT, PROCEEDINGS, 2005, 3739 : 888 - 893
[2] Logical structure analysis: From HTML']HTML to XML
Lee, Min-Hyung
Kim, Yeon-Seok
Lee, Kyong-Ho
COMPUTER STANDARDS & INTERFACES, 2007, 29 (01) : 109 - 124
[3] Managing knowledge on the Web - Extracting ontology from HTML']HTML Web
Du, Timon C.
Li, Feng
King, Irwin
DECISION SUPPORT SYSTEMS, 2009, 47 (04) : 319 - 331
[4] Information extraction from HTML']HTML tables base on domain ontology
Hsiao, SL
Chou, SC
Chang, LP
IKE'03: PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON INFORMATION AND KNOWLEDGE ENGINEERING, VOLS 1 AND 2, 2003, : 70 - 76
[5] HTML']HTML Violations and Where to Find Them: A Longitudinal Analysis of Specification Violations in HTML']HTML
Hantke, Florian
Stock, Ben
PROCEEDINGS OF THE 2022 22ND ACM INTERNET MEASUREMENT CONFERENCE, IMC 2022, 2022, : 358 - 373
[6] Advanced user profile agent using structure analysis of HTML']HTML document
Kwak, JH
Kim, K
Lee, CH
INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL I AND II, 1999, : 319 - 323
[7] Rec.HTML']HTML: Declarative HTML']HTML
Reynders, Bob
Choi, Kwanghoon
COMPANION PROCEEDINGS OF THE 5TH INTERNATIONAL CONFERENCE ON THE ART, SCIENCE, AND ENGINEERING OF PROGRAMMING (PROGRAMMING 2021 COMPANION), 2021, : 1 - 5
[8] Analysis of the HTML']HTML to XML Conversion Method
Li Busheng
Hu Jingfang
PROCEEDINGS OF THE 2015 INTERNATIONAL SYMPOSIUM ON COMPUTERS & INFORMATICS, 2015, 13 : 64 - 69
[9] CONCEPTS EXTRACTION BASED ON HTML']HTML DOCUMENTS STRUCTURE
Zarrad, Rim
Doggaz, Narjes
Zagrouba, Ezzeddine
ICAART: PROCEEDINGS OF THE 4TH INTERNATIONAL CONFERENCE ON AGENTS AND ARTIFICIAL INTELLIGENCE, VOL 1, 2012, : 503 - 506
[10] Analysis and Interpretation of Semantic HTML']HTML Tables
Yin, Wensheng
Guo, Feifei
Xu, Fan
Chen, Xiuguo
WEB INFORMATION SYSTEMS AND MINING, PROCEEDINGS, 2009, 5854 : 71 - 79

← 1 2 3 4 5 →