Discovering filter keywords for company name disambiguation in twitter

被引:21
|
作者
Spina, Damiano [1 ]
Gonzalo, Julio
Amigo, Enrique
机构
[1] UNED NLP, Madrid 28040, Spain
关键词
Twitter; Online reputation management; Name disambiguation; Filtering;
D O I
10.1016/j.eswa.2013.03.001
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
A major problem in monitoring the online reputation of companies, brands, and other entities is that entity names are often ambiguous (apple may refer to the company, the fruit, the singer, etc.). The problem is particularly hard in microblogging services such as Twitter, where texts are very short and there is little context to disambiguate. In this paper we address the filtering task of determining, out of a set of tweets that contain a company name, which ones do refer to the company. Our approach relies on the identification of filter keywords: those whose presence in a tweet reliably confirm (positive keywords) or discard (negative keywords) that the tweet refers to the company. We describe an algorithm to extract filter keywords that does not use any previously annotated data about the target company. The algorithm allows to classify 58% of the tweets with 75% accuracy; and those can be used to feed a machine learning algorithm to obtain a complete classification of all tweets with an overall accuracy of 73%. In comparison, a 10-fold validation of the same machine learning algorithm provides an accuracy of 85%, i.e., our unsupervised algorithm has a 14% loss with respect to its supervised counterpart. Our study also shows that (i) filter keywords for Twitter does not directly derive from the public information about the company in the Web: a manual selection of keywords from relevant web sources only covers 15% of the tweets with 86% accuracy; (ii) filter keywords can indeed be a productive way of classifying tweets: the five best possible keywords cover, in average, 28% of the tweets for a company in our test collection. (C) 2013 Elsevier Ltd. All rights reserved.
引用
收藏
页码:4986 / 5003
页数:18
相关论文
共 50 条
  • [1] Filter Keywords and Majority Class Strategies for Company Name Disambiguation in Twitter
    Spina, Damiano
    Amigo, Enrique
    Gonzalo, Julio
    MULTILINGUAL AND MULTIMODAL INFORMATION ACCESS EVALUATION, 2011, 6941 : 50 - +
  • [2] EXPERIMENTS ON COMPANY NAME DISAMBIGUATION WITH SUPERVISED CLASSIFICATION TECHNIQUES
    Polat, Nafiye
    2013 INTERNATIONAL CONFERENCE ON ELECTRONICS, COMPUTER AND COMPUTATION (ICECCO), 2013, : 139 - 142
  • [3] Company Name Disambiguation in Tweets: A Two-Step Filtering Approach
    Qureshi, M. Atif
    Younus, Arjumand
    O'Riordan, Colm
    Pasi, Gabriella
    INFORMATION RETRIEVAL TECHNOLOGY, AIRS 2015, 2015, 9460 : 358 - 365
  • [4] Exploring the power of supervised learning methods for company name disambiguation in microblog posts
    Polat, Esma Nafiye
    Cakmak, Ali
    Turan, Rabia Nuray
    TURKISH JOURNAL OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCES, 2020, 28 (05) : 2400 - 2415
  • [5] Name disambiguation in AMiner
    Jing Zhang
    Jie Tang
    Science China Information Sciences, 2021, 64
  • [6] Author Name Disambiguation
    Smalheiser, Neil R.
    Torvik, Vetle I.
    ANNUAL REVIEW OF INFORMATION SCIENCE AND TECHNOLOGY, 2009, 43 : 287 - 313
  • [7] Name disambiguation in AMiner
    Jing ZHANG
    Jie TANG
    Science China(Information Sciences), 2021, 64 (04) : 214 - 216
  • [8] Name disambiguation in AMiner
    Zhang, Jing
    Tang, Jie
    SCIENCE CHINA-INFORMATION SCIENCES, 2021, 64 (04)
  • [9] Dirichlet Process Gaussian Mixture for Active Online Name Disambiguation by Particle Filter
    Zhang, Baichuan
    Dundar, Murat
    Dave, Vachik
    Al Hasan, Mohammad
    2019 ACM/IEEE JOINT CONFERENCE ON DIGITAL LIBRARIES (JCDL 2019), 2019, : 269 - 278
  • [10] Discovering the semantics of user keywords
    Trillo, Raquel
    Gracia, Jorge
    Espinoza, Mauricio
    Mena, Eduardo
    JOURNAL OF UNIVERSAL COMPUTER SCIENCE, 2007, 13 (12) : 1908 - 1935