ONTOLOGY-BASED INFORMATION EXTRACTION FROM PDF DOCUMENTS WITH XONTO

被引：4

作者：

Oro, Ermelinda ^{[1
]}

Ruffolo, Massimo ^{[2
]}

Sacca, Domenico ^{[1
]}

机构：

[1] Univ Calabria, Dept Elect Comp & Syst Sci, I-87036 Arcavacata Di Rende, CS, Italy

[2] Italian Natl Res Council, High Performance Comp & Networking Inst, I-87036 Arcavacata Di Rende, CS, Italy

来源：

INTERNATIONAL JOURNAL ON ARTIFICIAL INTELLIGENCE TOOLS | 2009年 / 18卷 / 05期

关键词：

Ontology-based information extraction; knowledge representation and reasoning; ontology; semantics; logic programming; attribute grammar; augmented transition network; PDF document; SYSTEM;

D O I：

10.1142/S0218213009000354

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Information extraction is of paramount importance in several real world applications in the are as of business, competitive and military intelligence because it enables to acquire information contained in unstructured documents and store them in structured forms. Unstructured documents have different internal encodings, one of the most diffused encoding is the visualization-oriented Adobe portable document format (PDF). Although several sophisticated and indeed complex approaches were proposed, they are still limited in many aspects. In particular, existing information extraction systems cannot be applied to PDF documents because of their completely unstructured nature that posemany issues in defining IE approaches. In this paper the novel ontology-based system named XONTO, that allows these mantic extraction of information from PDF documents, is presented. The XONTO system is founded on the idea of self-describing ontologies in which objects and classes can be equipped by a set of rules named descriptors. These rules represent patterns that allow to automatically recognize and extract ontology objects contained in PDF documents also when information is arranged in tabular form. This way a self-describing ontology expresses these mantic of the information to extract and the rules that, inturn, populate itself. In the paper XONTO system behaviors and structure are sketched by means of a running example

引用

页码：673 / 695

页数：23

共 50 条

[1] XONTO: An Ontology-based System for Semantic Information Extraction from PDF Documents
Oro, Ermelinda
Ruffolo, Massimo
20TH IEEE INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE, VOL 1, PROCEEDINGS, 2008, : 118 - +
[2] Towards a System for Ontology-Based Information Extraction from PDF Documents
Oro, Ermelinda
Ruffolo, Massimo
ON THE MOVE TO MEANINGFUL INTERNET SYSTEMS: OTM 2008, PT II, PROCEEDINGS, 2008, 5332 : 1482 - 1499
[3] Ontology-Based Hazard Information Extraction from Chinese Food Complaint Documents
Yang, Xiquan
Gao, Rui
Han, Zhengfu
Sui, Xin
ADVANCES IN SWARM INTELLIGENCE, ICSI 2012, PT II, 2012, 7332 : 155 - 163
[4] Automatic ontology-based knowledge extraction from web documents
Alani, H
Kim, S
Millard, DE
Weal, MJ
Hall, W
Lewis, PH
Shadbolt, NR
IEEE INTELLIGENT SYSTEMS, 2003, 18 (01) : 14 - 21
[5] Ontology-Based Information Retrieval for Historical Documents
Ramli, Fatihah
Noah, Shahrul Azman
Kurniawan, Tri Basuki
2016 THIRD INTERNATIONAL CONFERENCE ON INFORMATION RETRIEVAL AND KNOWLEDGE MANAGEMENT (CAMP), 2016, : 55 - 59
[6] Ontology-Based Information Extraction from Spanish Forum
Pena, Willy
Melgar, Andres
COMPUTATIONAL COLLECTIVE INTELLIGENCE (ICCCI 2015), PT I, 2015, 9329 : 351 - 360
[7] Ontology-Based Web Information Extraction
Mo, Qian
Chen, Yi-hong
COMMUNICATIONS AND INFORMATION PROCESSING, PT 1, 2012, 288 : 118 - 126
[8] Ontology-based information retrieval and extraction
Lee, CY
Soo, VW
ITRE 2005: 3RD INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY: RESEARCH AND EDUCATION, PROCEEDINGS, 2005, : 265 - 269
[9] An ontology-based index to retrieve documents with geographic information
Luaces, Miguel R.
Parama, Jose R.
Pedreira, Oscar
Seco, Diego
SCIENTIFIC AND STATISTICAL DATABASE MANAGEMENT, PROCEEDINGS, 2008, 5069 : 384 - 400
[10] Ontology-based information extraction from the World Wide Web
Korst, Jan
Geleijnse, Gijs
de Jong, Nick
Verschoor, Michael
INTELLIGENT ALGORITHMS IN AMBIENT AND BIOMEDICAL COMPUTING, 2006, 7 : 149 - +

← 1 2 3 4 5 →