Towards a System for Ontology-Based Information Extraction from PDF Documents

被引:0
|
作者
Oro, Ermelinda
Ruffolo, Massimo
机构
关键词
Ontology; Information Extraction; Attribute Grammars; Knowledge Representation; Datalog;
D O I
暂无
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Ontologies enable to directly encode domain knowledge in software applications, so ontology-based systems call exploit the meaning of information for providing advanced and intelligent functionalities. One of the most interesting, and promising application of ontologies is information extraction from unstructured documents. In this area the extraction of meaningful information from PDF documents has been recently recognized as an important and challenging problem. This paper proposes an ontology-based information extraction system for PDF documents founded on a well suited knowledge representation approach named self-populating ontology (SPO). The SPO approach combines object-oriented logic-based features with formal grammar capabilities and allows expressing knowledge in term of ontology schemas instances, and extraction rules (called descriptors) aimed at extracting information having also tabular form. The novel aspect of the SPO approach is that it allows to represent ontologies enriched by rules that enable them to populate them-self with instances extracted from unstructured PDF documents. In the paper the tractability of the SPO approach is proven. Moreover, features and the behavior of the prototypical implementation of the SPO system are illustrated by means of a running, example.
引用
收藏
页码:1482 / 1499
页数:18
相关论文
共 50 条
  • [1] ONTOLOGY-BASED INFORMATION EXTRACTION FROM PDF DOCUMENTS WITH XONTO
    Oro, Ermelinda
    Ruffolo, Massimo
    Sacca, Domenico
    [J]. INTERNATIONAL JOURNAL ON ARTIFICIAL INTELLIGENCE TOOLS, 2009, 18 (05) : 673 - 695
  • [2] XONTO: An Ontology-based System for Semantic Information Extraction from PDF Documents
    Oro, Ermelinda
    Ruffolo, Massimo
    [J]. 20TH IEEE INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE, VOL 1, PROCEEDINGS, 2008, : 118 - +
  • [3] Ontology-Based Hazard Information Extraction from Chinese Food Complaint Documents
    Yang, Xiquan
    Gao, Rui
    Han, Zhengfu
    Sui, Xin
    [J]. ADVANCES IN SWARM INTELLIGENCE, ICSI 2012, PT II, 2012, 7332 : 155 - 163
  • [4] Vulcain - An ontology-based information extraction system
    Todirascu, A
    Romary, L
    Bekhouche, D
    [J]. NATURAL LANGUAGE PROCESSING AND INFORMATION SYSTEMS, 2002, 2553 : 64 - 75
  • [5] A hybrid ontology-based information extraction system
    Gutierrez, Fernando
    Dou, Dejing
    Fickas, Stephen
    Wimalasuriya, Daya
    Zong, Hui
    [J]. JOURNAL OF INFORMATION SCIENCE, 2016, 42 (06) : 798 - 820
  • [6] Towards an Ontology-based Soil Information System
    Shu, Yanfeng
    Liu, Qing
    [J]. 21ST INTERNATIONAL CONGRESS ON MODELLING AND SIMULATION (MODSIM2015), 2015, : 1462 - 1468
  • [7] Towards Knowledge Handling in Ontology-Based Information Extraction Systems
    Konys, Agnieszka
    [J]. KNOWLEDGE-BASED AND INTELLIGENT INFORMATION & ENGINEERING SYSTEMS (KES-2018), 2018, 126 : 2208 - 2218
  • [8] Automatic ontology-based knowledge extraction from web documents
    Alani, H
    Kim, S
    Millard, DE
    Weal, MJ
    Hall, W
    Lewis, PH
    Shadbolt, NR
    [J]. IEEE INTELLIGENT SYSTEMS, 2003, 18 (01) : 14 - 21
  • [9] Towards ontology-based information extraction in distributed manufacturing systems
    Li, B. X.
    Yang, L.
    Ong, S. K.
    Lei, Y.
    Nee, A. Y. C.
    [J]. INNOVATIVE DEVELOPMENTS IN DESIGN AND MANUFACTURING: ADVANCED RESEARCH IN VIRTUAL AND RAPID PROTOTYPING, 2010, : 483 - 488
  • [10] Ontology-based Drug Product Information Extraction System
    Li, Wen-jie
    Shen, Nan
    [J]. PROCEEDINGS OF THE 2009 2ND INTERNATIONAL CONFERENCE ON BIOMEDICAL ENGINEERING AND INFORMATICS, VOLS 1-4, 2009, : 1672 - +