Crowdsourcing and annotating NER for Twitter #drift

被引:0
|
作者
Fromreide, Hege [1 ]
Hovy, Dirk [1 ]
Sogaard, Anders [1 ]
机构
[1] Univ Copenhagen, Ctr Language Technol, DK-1168 Copenhagen, Denmark
关键词
NER; Twitter; crowdsourcing;
D O I
暂无
中图分类号
H0 [语言学];
学科分类号
030303 ; 0501 ; 050102 ;
摘要
We present two new NER datasets for Twitter; a manually annotated set of 1,467 tweets (kappa = 0.942) and a set of 2,975 expert-corrected, crowdsourced NER annotated tweets from the dataset described in Finin et al. (2010). In our experiments with these datasets, we observe two important points: (a) language drift on Twitter is significant, and while off-the-shelf systems have been reported to perform well on in-sample data, they often perform poorly on new samples of tweets, (b) state-of-the-art performance across various datasets can be obtained from crowdsourced annotations, making it more feasible to "catch up" with language drift.
引用
收藏
页码:2544 / 2547
页数:4
相关论文
共 40 条
  • [21] Crowdsourcing based API Search via Leveraging Twitter Lists Information
    Liang, Tingting
    Chen, Liang
    Ying, Haochao
    Zheng, Zibin
    Wu, Jian
    [J]. 2015 IEEE INTERNATIONAL CONFERENCE ON DATA MINING WORKSHOP (ICDMW), 2015, : 1540 - 1547
  • [22] Interactive Information Crowdsourcing for Disaster Management Using SMS and Twitter: A Research Prototype
    Das, Anubrata
    Mallik, Neeratyoy
    Bandyopadhyay, Somprakash
    Das Bit, Sipra
    Basak, Jayanta
    [J]. 2016 IEEE INTERNATIONAL CONFERENCE ON PERVASIVE COMPUTING AND COMMUNICATION WORKSHOPS (PERCOM WORKSHOPS), 2016,
  • [23] Finding and validating medical information shared on Twitter: Experiences using a crowdsourcing approach
    Duberstein S.J.
    Asamoah D.A.
    Doran D.
    Schiller S.Z.
    [J]. International Journal of Web Engineering and Technology, 2019, 14 (01) : 80 - 98
  • [24] Asymmetric Self-Learning for Tackling Twitter Spam Drift
    Chen, Chao
    Zhang, Jun
    Xiang, Yang
    Zhou, Wanlei
    [J]. 2015 IEEE CONFERENCE ON COMPUTER COMMUNICATIONS WORKSHOPS (INFOCOM WKSHPS), 2015, : 208 - 213
  • [25] FS-NER: A Lightweight Filter-Stream Approach to Named Entity Recognition on Twitter Data
    de Oliveira, Diego Marinho
    Laender, Alberto H. F.
    Veloso, Adriano
    da Silva, Altigran S.
    [J]. PROCEEDINGS OF THE 22ND INTERNATIONAL CONFERENCE ON WORLD WIDE WEB (WWW'13 COMPANION), 2013, : 597 - 604
  • [26] ChatGPT vs. Crowdsourcing vs. Experts: Annotating Open-Domain Conversations with Speech Functions
    Ostyakova, Lidiia
    Smilga, Veronika
    Petukhova, Kseniia
    Molchanova, Maria
    Kornev, Daniel
    [J]. 24TH MEETING OF THE SPECIAL INTEREST GROUP ON DISCOURSE AND DIALOGUE, SIGDIAL 2023, 2023, : 242 - 254
  • [27] Catching Zika Fever: Application of Crowdsourcing and Machine Learning for Tracking Health Misinformation on Twitter
    Ghenai, Amira
    Mejova, Yelena
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON HEALTHCARE INFORMATICS (ICHI), 2017, : 518 - 523
  • [28] Crowdsourcing Analysis of Twitter Data on Climate Change: Paid Workers vs. Volunteers
    Kirilenko, Andrei P.
    Desell, Travis
    Kim, Hany
    Stepchenkova, Svetlana
    [J]. SUSTAINABILITY, 2017, 9 (11)
  • [29] Crowdsourcing Twitter annotations to identify first-hand experiences of prescription drug use
    Alvaro, Nestor
    Conway, Mike
    Doan, Son
    Lofi, Christoph
    Overington, John
    Collier, Nigel
    [J]. JOURNAL OF BIOMEDICAL INFORMATICS, 2015, 58 : 280 - 287
  • [30] Sentiment Drift Detection and Analysis in Real Time Twitter Data Streams
    Susi E.
    Shanthi A.P.
    [J]. Computer Systems Science and Engineering, 2023, 45 (03): : 3231 - 3246