Discovering filter keywords for company name disambiguation in twitter

被引:21
|
作者
Spina, Damiano [1 ]
Gonzalo, Julio
Amigo, Enrique
机构
[1] UNED NLP, Madrid 28040, Spain
关键词
Twitter; Online reputation management; Name disambiguation; Filtering;
D O I
10.1016/j.eswa.2013.03.001
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
A major problem in monitoring the online reputation of companies, brands, and other entities is that entity names are often ambiguous (apple may refer to the company, the fruit, the singer, etc.). The problem is particularly hard in microblogging services such as Twitter, where texts are very short and there is little context to disambiguate. In this paper we address the filtering task of determining, out of a set of tweets that contain a company name, which ones do refer to the company. Our approach relies on the identification of filter keywords: those whose presence in a tweet reliably confirm (positive keywords) or discard (negative keywords) that the tweet refers to the company. We describe an algorithm to extract filter keywords that does not use any previously annotated data about the target company. The algorithm allows to classify 58% of the tweets with 75% accuracy; and those can be used to feed a machine learning algorithm to obtain a complete classification of all tweets with an overall accuracy of 73%. In comparison, a 10-fold validation of the same machine learning algorithm provides an accuracy of 85%, i.e., our unsupervised algorithm has a 14% loss with respect to its supervised counterpart. Our study also shows that (i) filter keywords for Twitter does not directly derive from the public information about the company in the Web: a manual selection of keywords from relevant web sources only covers 15% of the tweets with 86% accuracy; (ii) filter keywords can indeed be a productive way of classifying tweets: the five best possible keywords cover, in average, 28% of the tweets for a company in our test collection. (C) 2013 Elsevier Ltd. All rights reserved.
引用
收藏
页码:4986 / 5003
页数:18
相关论文
共 50 条
  • [41] Location detection and disambiguation from twitter messages
    Inkpen, Diana
    Liu, Ji
    Farzindar, Atefeh
    Kazemi, Farzaneh
    Ghazi, Diman
    JOURNAL OF INTELLIGENT INFORMATION SYSTEMS, 2017, 49 (02) : 237 - 253
  • [42] Location detection and disambiguation from twitter messages
    Diana Inkpen
    Ji Liu
    Atefeh Farzindar
    Farzaneh Kazemi
    Diman Ghazi
    Journal of Intelligent Information Systems, 2017, 49 : 237 - 253
  • [43] Multiple Features Driven Author Name Disambiguation
    Zhou, Qian
    Chen, Wei
    Wang, Weiqing
    Xu, Jiajie
    Zhao, Lei
    2021 IEEE INTERNATIONAL CONFERENCE ON WEB SERVICES, ICWS 2021, 2021, : 506 - 515
  • [44] Person name disambiguation on the web in a multilingual context
    Delgado, Agustin D.
    Martinez, Raquel
    Montalvo, Soto
    Fresno, Victor
    INFORMATION SCIENCES, 2018, 465 : 373 - 387
  • [45] Chinese Personal Name Disambiguation Based on Clustering
    Fan, Chao
    Li, Yu
    WIRELESS COMMUNICATIONS & MOBILE COMPUTING, 2021, 2021
  • [46] Chinese Personal Name Disambiguation Based on Clustering
    Fan, Chao
    Li, Yu
    Wireless Communications and Mobile Computing, 2021, 2021
  • [47] Author Name Disambiguation Based on Heterogeneous Graph
    Ma, Chuang
    Xia, Helong
    Journal of Computers (Taiwan), 2023, 34 (04) : 41 - 52
  • [48] Personal name disambiguation in Farsi web pages
    Emami, Hojjat
    UPB Scientific Bulletin, Series C: Electrical Engineering and Computer Science, 2019, 81 (02): : 97 - 116
  • [49] Using Web Information for Author Name Disambiguation
    Pereira, Denilson Alves
    Ribeiro-Neto, Berthier
    Ziviani, Nivio
    Laender, Alberto H. F.
    Goncalves, Marcos Andre
    Ferreira, Anderson A.
    JCDL 09: PROCEEDINGS OF THE 2009 ACM/IEEE JOINT CONFERENCE ON DIGITAL LIBRARIES, 2009, : 49 - 58
  • [50] Social Network Analysis on Name Disambiguation and More
    On, Byung-Won
    THIRD 2008 INTERNATIONAL CONFERENCE ON CONVERGENCE AND HYBRID INFORMATION TECHNOLOGY, VOL 2, PROCEEDINGS, 2008, : 1081 - 1088