Open Information Extraction from the Web

被引:0
|
作者
Banko, Michele [1 ]
Cafarella, Michael J. [1 ]
Soderland, Stephen [1 ]
Broadhead, Matt [1 ]
Etzioni, Oren [1 ]
机构
[1] Univ Washington, Dept Comp Sci & Engn, Turing Ctr, Seattle, WA 98195 USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Traditionally, Information Extraction (IE) has focused on satisfying precise, narrow, pre-specified requests from small homogeneous corpora (e.g., extract the location and time of seminars from a set of announcements). Shifting to a new domain requires the user to name the target relations and to manually create new extraction rules or hand-tag new training examples. This manual labor scales linearly with the number of target relations. This paper introduces Open IE (OIE), a new extraction paradigm where the system makes a single data-driven pass over its corpus and extracts a large set of relational tuples without requiring any human input. The paper also introduces TEXTRUNNER, a fully implemented, highly scalable OIE system where the tuples are assigned a probability and indexed to support efficient extraction and exploration via user queries. We report on experiments over a 9,000,000 Web page corpus that compare TEXTRUNNER with KNOWITALL, a state-of-the-art Web IE system. TEXTRUNNER achieves an error reduction of 33% on a comparable set of extractions. Furthermore, in the amount of time it takes KNOWITALL to perform extraction for a handful of pre-specified relations, TEXTRUNNER extracts a far broader set of facts reflecting orders of magnitude more relations, discovered on the fly. We report statistics on TEXTRUNNER's 11,000,000 highest probability tuples, and show that they contain over 1,000,000 concrete facts and over 6,500,000 more abstract assertions.
引用
收藏
页码:2670 / 2676
页数:7
相关论文
共 50 条
  • [31] Big Scholarly Data in CiteSeerX: Information Extraction from the Web
    Ororbia, Alexander G., II
    Wu, Jian
    Khabsa, Madian
    Williams, Kyle
    Giles, C. Lee
    [J]. WWW'15 COMPANION: PROCEEDINGS OF THE 24TH INTERNATIONAL CONFERENCE ON WORLD WIDE WEB, 2015, : 597 - 602
  • [32] Automatic information extraction from the Web: Case study with recipes
    Smith, Neva
    Lin, King-Ip
    [J]. PROCEEDINGS OF THE 50TH ANNUAL ASSOCIATION FOR COMPUTING MACHINERY SOUTHEAST CONFERENCE, 2012,
  • [33] Information extraction from semi-structured web documents
    Yun, Bo-Hyun
    Seo, Chang-Ho
    [J]. KNOWLEDGE SCIENCE, ENGINEERING AND MANAGEMENT, 2006, 4092 : 586 - 598
  • [34] Joint Information Extraction from the Web Using Linked Data
    Augenstein, Isabelle
    [J]. SEMANTIC WEB - ISWC 2014, PT II, 2014, 8797 : 505 - 512
  • [35] Multilingual Open Information Extraction
    Gamallo, Pablo
    Garcia, Marcos
    [J]. PROGRESS IN ARTIFICIAL INTELLIGENCE-BK, 2015, 9273 : 711 - 722
  • [36] Neural Open Information Extraction
    Cui, Lei
    Wei, Furu
    Zhou, Ming
    [J]. PROCEEDINGS OF THE 56TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 2, 2018, : 407 - 413
  • [37] Open Information Extraction usingWikipedia
    Wu, Fei
    Weld, Daniel S.
    [J]. ACL 2010: 48TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, 2010, : 118 - 127
  • [38] From hyperlinks to Semantic Web properties using Open Knowledge Extraction
    Presutti, Valentina
    Nuzzolese, Andrea Giovanni
    Consoli, Sergio
    Gangemi, Aldo
    Recupero, Diego Reforgiato
    [J]. SEMANTIC WEB, 2016, 7 (04) : 351 - 378
  • [39] The Web-OEM approach to Web information extraction
    Iocchi, L
    [J]. JOURNAL OF NETWORK AND COMPUTER APPLICATIONS, 1999, 22 (04) : 259 - 269
  • [40] A recursive algorithm for open information extraction from Persian texts
    Rahat, Mahmoud
    Talebpour, Alireza
    Monemian, Seyedamin
    [J]. INTERNATIONAL JOURNAL OF COMPUTER APPLICATIONS IN TECHNOLOGY, 2018, 57 (03) : 193 - 206