Ontology-based information extraction and integration from heterogeneous data sources

被引:58
|
作者
Buitelaar, Paul [2 ]
Cimiano, Philipp [1 ]
Frank, Anette [3 ]
Hartung, Matthias [3 ]
Racloppa, Stefania [2 ]
机构
[1] Univ Karlsruhe TH, Inst AIFB, D-76131 Karlsruhe, Germany
[2] DFKI GmbH, Language Technol Lab, D-66123 Saarbrucken, Germany
[3] Heidelberg Univ, Seminar Comp Linguist, D-69120 Heidelberg, Germany
关键词
Ontology-based natural language processing; Information extraction; Knowledge integration; Question answering;
D O I
10.1016/j.ijhcs.2008.07.007
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
In this paper we present the design, implementation and evaluation of SOBA, a system for ontology-based information extraction from heterogeneous data resources, including plain text, tables and image captions. SOBA is capable of processing structured information, text and image captions to extract information and integrate it into a coherent knowledge base. To establish coherence, SOBA interlinks the information extracted from different sources and detects duplicate information. The knowledge base produced by SOBA can then be used to query for information contained in the different sources in an integrated and seamless manner. Overall, this allows for advanced retrieval functionality by which questions can be answered precisely. A further distinguishing feature of the SOBA system is that it straightforwardly integrates deep and shallow natural language processing to increase robustness and accuracy. We discuss the implementation and application of the SOBA system within the SmartWeb multimodal dialog system. In addition, we present a thorough evaluation of the different components of the system. However, an end-to-end evaluation of the whole SmartWeb system is out of the scope of this paper and has been presented elsewhere by the SmartWeb consortium. (C) 2008 Elsevier Ltd. All rights reserved.
引用
收藏
页码:759 / 788
页数:30
相关论文
共 50 条
  • [21] A Framework For Ontology-based Data Integration
    Li Dong
    Huang Linpeng
    [J]. ICICSE: 2008 INTERNATIONAL CONFERENCE ON INTERNET COMPUTING IN SCIENCE AND ENGINEERING, PROCEEDINGS, 2008, : 207 - 214
  • [22] Ontology-based product data integration
    Guo, M
    Li, SP
    Dong, JX
    Fu, XJ
    Hu, YJ
    Yin, QW
    [J]. AINA 2003: 17TH INTERNATIONAL CONFERENCE ON ADVANCED INFORMATION NETWORKING AND APPLICATIONS, 2003, : 530 - 533
  • [23] Ontology-based integration for relational data
    Dou, DJ
    LePendu, P
    [J]. ON THE MOVE TO MEANINGFUL INTERNET SYSTEMS 2005: OTM 2005 WORKSHOPS, PROCEEDINGS, 2005, 3762 : 35 - 36
  • [24] Ontology-based information extraction from the World Wide Web
    Korst, Jan
    Geleijnse, Gijs
    de Jong, Nick
    Verschoor, Michael
    [J]. INTELLIGENT ALGORITHMS IN AMBIENT AND BIOMEDICAL COMPUTING, 2006, 7 : 149 - +
  • [25] Mediators over ontology-based information sources
    Tzitzikas, Y
    Spyratos, N
    Constantopoulos, P
    [J]. SECOND INTERNATIONAL CONFERENCE ON WEB INFORMATION SYSTEMS ENGINEERING, VOL I, PROCEEDINGS, 2002, : 31 - 40
  • [26] Ontology-based integration of OLAP and information retrieval
    Priebe, T
    Pernul, G
    [J]. 14TH INTERNATIONAL WORKSHOP ON DATABASE AND EXPERT SYSTEMS APPLICATIONS, PROCEEDINGS, 2003, : 610 - 614
  • [27] Ontology-based interactive information extraction from scientific abstracts
    Milward, D
    Bjäreland, M
    Hayes, W
    Maxwell, M
    Öberg, L
    Tilford, N
    Thomas, J
    Hale, R
    Knight, S
    Barnes, JE
    [J]. COMPARATIVE AND FUNCTIONAL GENOMICS, 2005, 6 (1-2): : 67 - 71
  • [28] ONTOLOGY-BASED INFORMATION EXTRACTION FROM PDF DOCUMENTS WITH XONTO
    Oro, Ermelinda
    Ruffolo, Massimo
    Sacca, Domenico
    [J]. INTERNATIONAL JOURNAL ON ARTIFICIAL INTELLIGENCE TOOLS, 2009, 18 (05) : 673 - 695
  • [29] An Ontology-Based Approach to Web Information Integration
    Zhang, Lin
    [J]. WISM: 2009 INTERNATIONAL CONFERENCE ON WEB INFORMATION SYSTEMS AND MINING, PROCEEDINGS, 2009, : 87 - 90
  • [30] Ontology-based information integration in the automotive industry
    Maier, A
    Schnurr, HP
    Sure, Y
    [J]. SEMANTIC WEB - ISWC 2003, 2003, 2870 : 897 - 912