Unsupervised named-entity extraction from the Web: An experimental study

被引:441
|
作者
Etzioni, O [1 ]
Cafarella, M [1 ]
Downey, D [1 ]
Popescu, AM [1 ]
Shaked, T [1 ]
Soderland, S [1 ]
Weld, DS [1 ]
Yates, A [1 ]
机构
[1] Univ Washington, Dept Comp Sci & Engn, Seattle, WA 98195 USA
关键词
information extraction; pointwise mutual information; unsupervised; question answering;
D O I
10.1016/j.artint.2005.03.001
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The KNOWITALL system aims to automate the tedious process of extracting large collections of facts (e.g., names of scientists or politicians) from the Web in an unsupervised, domain-independent, and scalable manner. The paper presents an overview of KNOWITALL's novel architecture and design principles, emphasizing its distinctive ability to extract information without any hand-labeled training examples. In its first major run, KNOWITALL extracted over 50,000 class instances, but suggested a challenge: How can we improve KNOWITALL's recall and extraction rate without sacrificing precision? This paper presents three distinct ways to address this challenge and evaluates their performance. Pattern Learning learns domain-specific extraction rules, which enable additional extractions. Subclass Extraction automatically identifies sub-classes in order to boost recall (e.g., "chemist" and c biologist" are identified as sub-classes of "scientist"). List Extraction locates lists of class instances, learns a "wrapper" for each list, and. extracts elements of each list. Since each method bootstraps from KNOWITALL's domain-independent methods, the methods also obviate hand-labeled training examples. The paper reports on experiments, focused on building lists of named entities, that measure the relative efficacy of each method and demonstrate their synergy. In concert, our methods gave KNOWITALL a 4-fold to 8-fold increase in recall at precision of 0.90, and discovered over 10,000 cities missing from the Tipster Gazetteer.
引用
收藏
页码:91 / 134
页数:44
相关论文
共 50 条
  • [1] Document Theme Extraction Using Named-Entity Recognition
    Nagrale, Deepali
    Khatavkar, Vaibhav
    Kulkarni, Parag
    [J]. COMPUTING, COMMUNICATION AND SIGNAL PROCESSING, ICCASP 2018, 2019, 810 : 499 - 509
  • [2] Unsupervised named-entity recognition: Generating gazetteers and resolving ambiguity
    Nadeau, David
    Turney, Peter D.
    Matwin, Stan
    [J]. ADVANCES IN ARTIFICIAL INTELLIGENCE, PROCEEDINGS, 2006, 4013 : 266 - 277
  • [3] Named-Entity Techniques for Terrorism Event Extraction and Classification
    Inyaem, Uraiwan
    Meesad, Phayung
    Haruechaiyasak, Choochart
    [J]. 2009 EIGHTH INTERNATIONAL SYMPOSIUM ON NATURAL LANGUAGE PROCESSING, PROCEEDINGS, 2009, : 175 - +
  • [4] An Approach to Web-Scale Named-Entity Disambiguation
    Sarmento, Luis
    Kehlenbeck, Alexander
    Oliveira, Eugenio
    Ungar, Lyle
    [J]. MACHINE LEARNING AND DATA MINING IN PATTERN RECOGNITION, 2009, 5632 : 689 - +
  • [5] Ontology Extraction from Software Requirements Using Named-Entity Recognition
    Kocerka, Jerzy
    Krzeslak, Michal
    Galuszka, Adam
    [J]. ADVANCES IN SCIENCE AND TECHNOLOGY-RESEARCH JOURNAL, 2022, 16 (03) : 207 - 212
  • [6] A Survey of Named-Entity Recognition Methods for Food Information Extraction
    Popovski, Gorjan
    Seljak, Barbara Korousic
    Eftimov, Tome
    [J]. IEEE ACCESS, 2020, 8 : 31586 - 31594
  • [7] Ranking algorithms for named-entity extraction: Boosting and the voted perceptron
    Collins, M
    [J]. 40TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, PROCEEDINGS OF THE CONFERENCE, 2002, : 489 - 496
  • [8] Automatic Named-Entity Set Expansion from the Web Using a Mutual Importance Measure
    Ko, Youngjoong
    Bae, Sangjun
    [J]. INFORMATION-AN INTERNATIONAL INTERDISCIPLINARY JOURNAL, 2012, 15 (11B): : 5029 - 5040
  • [9] Applying machine learning for high-performance named-entity extraction
    Baluja, S
    Mittal, VO
    Sukthankar, R
    [J]. COMPUTATIONAL INTELLIGENCE, 2000, 16 (04) : 586 - 595
  • [10] Named-Entity Recognition from Greek and English Texts
    Vangelis Karkaletsis
    Georgios Paliouras
    Georgios Petasis
    Natasa Manousopoulou
    Constantine D. Spyropoulos
    [J]. Journal of Intelligent and Robotic Systems, 1999, 26 : 123 - 135