Open Information Extraction from the Web

被引:0
|
作者
Banko, Michele [1 ]
Cafarella, Michael J. [1 ]
Soderland, Stephen [1 ]
Broadhead, Matt [1 ]
Etzioni, Oren [1 ]
机构
[1] Univ Washington, Dept Comp Sci & Engn, Turing Ctr, Seattle, WA 98195 USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Traditionally, Information Extraction (IE) has focused on satisfying precise, narrow, pre-specified requests from small homogeneous corpora (e.g., extract the location and time of seminars from a set of announcements). Shifting to a new domain requires the user to name the target relations and to manually create new extraction rules or hand-tag new training examples. This manual labor scales linearly with the number of target relations. This paper introduces Open IE (OIE), a new extraction paradigm where the system makes a single data-driven pass over its corpus and extracts a large set of relational tuples without requiring any human input. The paper also introduces TEXTRUNNER, a fully implemented, highly scalable OIE system where the tuples are assigned a probability and indexed to support efficient extraction and exploration via user queries. We report on experiments over a 9,000,000 Web page corpus that compare TEXTRUNNER with KNOWITALL, a state-of-the-art Web IE system. TEXTRUNNER achieves an error reduction of 33% on a comparable set of extractions. Furthermore, in the amount of time it takes KNOWITALL to perform extraction for a handful of pre-specified relations, TEXTRUNNER extracts a far broader set of facts reflecting orders of magnitude more relations, discovered on the fly. We report statistics on TEXTRUNNER's 11,000,000 highest probability tuples, and show that they contain over 1,000,000 concrete facts and over 6,500,000 more abstract assertions.
引用
收藏
页码:2670 / 2676
页数:7
相关论文
共 50 条
  • [1] Open Information Extraction from the Web
    Etzioni, Oren
    Banko, Michele
    Soderland, Stephen
    Weld, Daniel S.
    [J]. COMMUNICATIONS OF THE ACM, 2008, 51 (12) : 68 - 74
  • [2] Web Services for information extraction from the Web
    Habegger, B
    Quafafou, M
    [J]. IEEE INTERNATIONAL CONFERENCE ON WEB SERVICES, PROCEEDINGS, 2004, : 279 - 286
  • [3] Extraction and Visualization of Occupational Health and Safety Related Information from Open Web
    Dasgupta, Tirthankar
    Naskar, Abir
    Saha, Rupsa
    Dey, Lipika
    [J]. 2018 IEEE/WIC/ACM INTERNATIONAL CONFERENCE ON WEB INTELLIGENCE (WI 2018), 2018, : 434 - 439
  • [4] An Open Relation Extraction System for Web Text Information
    Li, Huagang
    Liu, Bo
    [J]. APPLIED SCIENCES-BASEL, 2022, 12 (11):
  • [5] Syntactic Representation Learning for Open Information Extraction on Web
    Ru, Chengsen
    Tang, Jintao
    Li, Shasha
    Wang, Ting
    [J]. WWW'17 COMPANION: PROCEEDINGS OF THE 26TH INTERNATIONAL CONFERENCE ON WORLD WIDE WEB, 2017, : 833 - 834
  • [6] Information Extraction from Web pages
    Novotny, Robert
    Vojtas, Peter
    Maruscak, Dusan
    [J]. 2009 IEEE/WIC/ACM INTERNATIONAL JOINT CONFERENCES ON WEB INTELLIGENCE (WI) AND INTELLIGENT AGENT TECHNOLOGIES (IAT), VOL 3, 2009, : 121 - +
  • [7] Extraction of structural information from the web
    Murata, T
    [J]. FUZZY SYSTEMS AND KNOWLEDGE DISCOVERY, PT 2, PROCEEDINGS, 2005, 3614 : 1204 - 1207
  • [8] From Open Information Extraction to Semantic Web: A Context Rule-Based Strategy
    Hernandez, Julio
    Lopez-Arevalo, Ivan
    Martinez-Rodriguez, Jose L.
    Aldana-Bobadilla, Edwyn
    [J]. MINING INTELLIGENCE AND KNOWLEDGE EXPLORATION, MIKE 2018, 2018, 11308 : 32 - 41
  • [9] Information extraction from multimedia web documents: an open-source platform and testbed
    Dupplaw, David Paul
    Matthews, Michael
    Johansson, Richard
    Boato, Giulia
    Costanzo, Andrea
    Fontani, Marco
    Minack, Enrico
    Demidova, Elena
    Blanco, Roi
    Griffiths, Thomas
    Lewis, Paul
    Hare, Jonathon
    Moschitti, Alessandro
    [J]. INTERNATIONAL JOURNAL OF MULTIMEDIA INFORMATION RETRIEVAL, 2014, 3 (02) : 97 - 111
  • [10] Linked Open Data Perspectives: Incorporating Linked Open Data into Information Extraction on the Web
    Adrian, Benjamin
    Dengel, Andreas
    [J]. IT-INFORMATION TECHNOLOGY, 2011, 53 (03): : 117 - 124