Open Information Extraction from the Web

被引：0

作者：

Banko, Michele ^{[1
]}

Cafarella, Michael J. ^{[1
]}

Soderland, Stephen ^{[1
]}

Broadhead, Matt ^{[1
]}

Etzioni, Oren ^{[1
]}

机构：

[1] Univ Washington, Dept Comp Sci & Engn, Turing Ctr, Seattle, WA 98195 USA

来源：

20TH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE | 2007年

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Traditionally, Information Extraction (IE) has focused on satisfying precise, narrow, pre-specified requests from small homogeneous corpora (e.g., extract the location and time of seminars from a set of announcements). Shifting to a new domain requires the user to name the target relations and to manually create new extraction rules or hand-tag new training examples. This manual labor scales linearly with the number of target relations. This paper introduces Open IE (OIE), a new extraction paradigm where the system makes a single data-driven pass over its corpus and extracts a large set of relational tuples without requiring any human input. The paper also introduces TEXTRUNNER, a fully implemented, highly scalable OIE system where the tuples are assigned a probability and indexed to support efficient extraction and exploration via user queries. We report on experiments over a 9,000,000 Web page corpus that compare TEXTRUNNER with KNOWITALL, a state-of-the-art Web IE system. TEXTRUNNER achieves an error reduction of 33% on a comparable set of extractions. Furthermore, in the amount of time it takes KNOWITALL to perform extraction for a handful of pre-specified relations, TEXTRUNNER extracts a far broader set of facts reflecting orders of magnitude more relations, discovered on the fly. We report statistics on TEXTRUNNER's 11,000,000 highest probability tuples, and show that they contain over 1,000,000 concrete facts and over 6,500,000 more abstract assertions.

引用

页码：2670 / 2676

页数：7

共 50 条

[1] Open Information Extraction from the Web
Etzioni, Oren
Banko, Michele
Soderland, Stephen
Weld, Daniel S.
[J]. COMMUNICATIONS OF THE ACM, 2008, 51 (12) : 68 - 74
[2] Web Services for information extraction from the Web
Habegger, B
Quafafou, M
[J]. IEEE INTERNATIONAL CONFERENCE ON WEB SERVICES, PROCEEDINGS, 2004, : 279 - 286
[3] Extraction and Visualization of Occupational Health and Safety Related Information from Open Web
Dasgupta, Tirthankar
Naskar, Abir
Saha, Rupsa
Dey, Lipika
[J]. 2018 IEEE/WIC/ACM INTERNATIONAL CONFERENCE ON WEB INTELLIGENCE (WI 2018), 2018, : 434 - 439
[4] An Open Relation Extraction System for Web Text Information
Li, Huagang
Liu, Bo
[J]. APPLIED SCIENCES-BASEL, 2022, 12 (11):
[5] Syntactic Representation Learning for Open Information Extraction on Web
Ru, Chengsen
Tang, Jintao
Li, Shasha
Wang, Ting
[J]. WWW'17 COMPANION: PROCEEDINGS OF THE 26TH INTERNATIONAL CONFERENCE ON WORLD WIDE WEB, 2017, : 833 - 834
[6] Information Extraction from Web pages
Novotny, Robert
Vojtas, Peter
Maruscak, Dusan
[J]. 2009 IEEE/WIC/ACM INTERNATIONAL JOINT CONFERENCES ON WEB INTELLIGENCE (WI) AND INTELLIGENT AGENT TECHNOLOGIES (IAT), VOL 3, 2009, : 121 - +
[7] Extraction of structural information from the web
Murata, T
[J]. FUZZY SYSTEMS AND KNOWLEDGE DISCOVERY, PT 2, PROCEEDINGS, 2005, 3614 : 1204 - 1207
[8] From Open Information Extraction to Semantic Web: A Context Rule-Based Strategy
Hernandez, Julio
Lopez-Arevalo, Ivan
Martinez-Rodriguez, Jose L.
Aldana-Bobadilla, Edwyn
[J]. MINING INTELLIGENCE AND KNOWLEDGE EXPLORATION, MIKE 2018, 2018, 11308 : 32 - 41
[9] Information extraction from multimedia web documents: an open-source platform and testbed
Dupplaw, David Paul
Matthews, Michael
Johansson, Richard
Boato, Giulia
Costanzo, Andrea
Fontani, Marco
Minack, Enrico
Demidova, Elena
Blanco, Roi
Griffiths, Thomas
Lewis, Paul
Hare, Jonathon
Moschitti, Alessandro
[J]. INTERNATIONAL JOURNAL OF MULTIMEDIA INFORMATION RETRIEVAL, 2014, 3 (02) : 97 - 111
[10] Linked Open Data Perspectives: Incorporating Linked Open Data into Information Extraction on the Web
Adrian, Benjamin
Dengel, Andreas
[J]. IT-INFORMATION TECHNOLOGY, 2011, 53 (03): : 117 - 124

← 1 2 3 4 5 →