A Piggyback System for Joint Entity Mention Detection and Linking in Web Queries

被引:18
|
作者
Cornolti, Marco [1 ]
Ferragina, Paolo [1 ]
Ciaramita, Massimiliano [2 ]
Rued, Stefan [3 ]
Schuetze, Hinrich [3 ]
机构
[1] Univ Pisa, Pisa, Italy
[2] Google, Zurich, Switzerland
[3] Univ Munich, Munich, Germany
基金
欧盟地平线“2020”;
关键词
Entity linking; query annotation; ERD; piggyback;
D O I
10.1145/2872427.2883061
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In this paper we study the problem of linking open-domain web-search queries towards entities drawn from the full entity inventory of Wikipedia articles. We introduce SMAPH2, a second-order approach that, by piggybacking on a web search engine, alleviates the noise and irregularities that characterize the language of queries and puts queries in a larger context in which it is easier to make sense of them. The key algorithmic idea underlying SMAPH-2 is to first discover a candidate set of entities and then link-back those entities to their mentions occurring in the input query. This allows us to confine the possible concepts pertinent to the query to only the ones really mentioned in it. The link-back is implemented via a collective disambiguation step based upon a supervised ranking model that makes one joint prediction for the annotation of the complete query optimizing directly the F1 measure. We evaluate both known features, such as word embeddings and semantic relatedness among entities, and several novel features such as an approximate distance between mentions and entities (which can handle spelling errors). We demonstrate that SMAPH-2 achieves state-of-the-art performance on the ERD@SIGIR2014 benchmark. We also publish GERDAQ (General Entity Recognition, Disambiguation and Annotation in Queries), a novel, public dataset built specifically for web-query entity linking via a crowdsourcing effort. SMAPH-2 outperforms the benchmarks by comparable margins also on GERDAQ.
引用
收藏
页码:567 / 578
页数:12
相关论文
共 50 条
  • [41] Neural Transition Based Parsing of Web Queries: An Entity Based Approach
    Malca, Rivka
    Reichart, Roi
    2018 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2018), 2018, : 2700 - 2710
  • [42] Boosting Entity Mention Detection for Targetted Twitter Streams with Global Contextual Embeddings
    Bhowmick, Satadisha Saha
    Dragut, Eduard C.
    Meng, Weiyi
    2022 IEEE 38TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE 2022), 2022, : 1085 - 1097
  • [43] Person Entity Linking in Email With NIL Detection
    Gao, Ning
    Dredze, Mark
    Oard, Douglas W.
    JOURNAL OF THE ASSOCIATION FOR INFORMATION SCIENCE AND TECHNOLOGY, 2017, 68 (10) : 2412 - 2424
  • [44] Trend and behavior detection from web queries
    Wang, PL
    Bownas, J
    Berry, MW
    SURVEY OF TEXT MINING: CLUSTERING, CLASSIFICATION, AND RETRIEVAL, 2004, : 173 - 183
  • [45] Joint Multilingual Supervision for Cross-lingual Entity Linking
    Upadhyay, Shyam
    Gupta, Nitish
    Roth, Dan
    2018 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2018), 2018, : 2486 - 2495
  • [46] A joint model for entity boundary detection and entity span recognition
    Nian, Yongming
    Chen, Yanping
    Qin, Yongbin
    Huang, Ruizhang
    Tang, Ruixue
    Hu, Ying
    JOURNAL OF KING SAUD UNIVERSITY-COMPUTER AND INFORMATION SCIENCES, 2022, 34 (10) : 8362 - 8369
  • [47] Entity Ranking for Queries with Modifiers Based on Knowledge Bases and Web Search Results
    Imrattanatrai, Wiradee
    Kato, Makoto P.
    Tanaka, Katsumi
    Yoshikawa, Masatoshi
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2018, E101D (09) : 2279 - 2290
  • [48] DWESM: An efficient entity-level search mechanism for deep web queries
    Kou, Yue
    Shen, Derong
    Nie, Tiezheng
    Yu, Ge
    Journal of Computational Information Systems, 2010, 6 (01): : 237 - 244
  • [49] Learning Web Queries for Retrieval of Relevant Information about an Entity in a Wikipedia Category
    Yadav, Vikrant
    Kumar, Sandeep
    PROCEEDINGS OF THE 25TH INTERNATIONAL CONFERENCE ON WORLD WIDE WEB (WWW'16 COMPANION), 2016, : 1013 - 1014
  • [50] THINKER - Entity Linking System for Turkish Language
    Kalender, Murat
    Korkmaz, Emin Erkan
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2018, 30 (02) : 367 - 380