Discovering filter keywords for company name disambiguation in twitter

被引:21
|
作者
Spina, Damiano [1 ]
Gonzalo, Julio
Amigo, Enrique
机构
[1] UNED NLP, Madrid 28040, Spain
关键词
Twitter; Online reputation management; Name disambiguation; Filtering;
D O I
10.1016/j.eswa.2013.03.001
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
A major problem in monitoring the online reputation of companies, brands, and other entities is that entity names are often ambiguous (apple may refer to the company, the fruit, the singer, etc.). The problem is particularly hard in microblogging services such as Twitter, where texts are very short and there is little context to disambiguate. In this paper we address the filtering task of determining, out of a set of tweets that contain a company name, which ones do refer to the company. Our approach relies on the identification of filter keywords: those whose presence in a tweet reliably confirm (positive keywords) or discard (negative keywords) that the tweet refers to the company. We describe an algorithm to extract filter keywords that does not use any previously annotated data about the target company. The algorithm allows to classify 58% of the tweets with 75% accuracy; and those can be used to feed a machine learning algorithm to obtain a complete classification of all tweets with an overall accuracy of 73%. In comparison, a 10-fold validation of the same machine learning algorithm provides an accuracy of 85%, i.e., our unsupervised algorithm has a 14% loss with respect to its supervised counterpart. Our study also shows that (i) filter keywords for Twitter does not directly derive from the public information about the company in the Web: a manual selection of keywords from relevant web sources only covers 15% of the tweets with 86% accuracy; (ii) filter keywords can indeed be a productive way of classifying tweets: the five best possible keywords cover, in average, 28% of the tweets for a company in our test collection. (C) 2013 Elsevier Ltd. All rights reserved.
引用
收藏
页码:4986 / 5003
页数:18
相关论文
共 50 条
  • [21] CONNA: Addressing Name Disambiguation on the Fly
    Chen, Bo
    Zhang, Jing
    Tang, Jie
    Cai, Lingfan
    Wang, Zhaoyu
    Zhao, Shu
    Chen, Hong
    Li, Cuiping
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2022, 34 (07) : 3139 - 3152
  • [22] Bootstrapping Active Name Disambiguation with Crowdsourcing
    Cheng, Yu
    Chen, Zhengzhang
    Wang, Jiang
    Agrawal, Ankit
    Choudhary, Alok
    PROCEEDINGS OF THE 22ND ACM INTERNATIONAL CONFERENCE ON INFORMATION & KNOWLEDGE MANAGEMENT (CIKM'13), 2013, : 1213 - 1216
  • [23] Institution name disambiguation for research assessment
    Shuiqing Huang
    Bo Yang
    Sulan Yan
    Ronald Rousseau
    Scientometrics, 2014, 99 : 823 - 838
  • [24] A Collective Approach to Scholar Name Disambiguation
    Luo, Dongsheng
    Ma, Shuai
    Yan, Yaowei
    Hu, Chunmin
    Zhang, Xiang
    Huai, Jinpeng
    2021 IEEE 37TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE 2021), 2021, : 2317 - 2318
  • [25] Multilingual name disambiguation with semantic information
    Kozareva, Zornitsa
    Vazquez, Sonia
    Montoyo, Andres
    TEXT, SPEECH AND DIALOGUE, PROCEEDINGS, 2007, 4629 : 23 - 30
  • [26] A Survey of Person Name Disambiguation on the Web
    Delgado, Agustin D.
    Montalvo, Soto
    Martinez Unanue, Raquel
    Fresno, Victor
    IEEE ACCESS, 2018, 6 : 59496 - 59514
  • [27] A Collective Approach to Scholar Name Disambiguation
    Luo, Dongsheng
    Ma, Shuai
    Yan, Yaowei
    Hu, Chunming
    Zhang, Xiang
    Huai, Jinpeng
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2022, 34 (05) : 2020 - 2032
  • [28] Efficient Name Disambiguation in Digital Libraries
    Zhu, Jia
    Fung, Gabriel
    Wang, Liwei
    WEB-AGE INFORMATION MANAGEMENT, 2011, 6897 : 430 - +
  • [29] An Efficient Technique for Author Name Disambiguation
    Hazra, Rima
    Saha, Anomitra
    Deb, Shubhra Baran
    Mitra, Debasis
    2016 IEEE INTERNATIONAL CONFERENCE ON CURRENT TRENDS IN ADVANCED COMPUTING (ICCTAC), 2016,
  • [30] Discovering spammer communities in twitter
    P. V. Bindu
    Rahul Mishra
    P. Santhi Thilagam
    Journal of Intelligent Information Systems, 2018, 51 : 503 - 527