Crowdsourcing and annotating NER for Twitter #drift

被引:0
|
作者
Fromreide, Hege [1 ]
Hovy, Dirk [1 ]
Sogaard, Anders [1 ]
机构
[1] Univ Copenhagen, Ctr Language Technol, DK-1168 Copenhagen, Denmark
关键词
NER; Twitter; crowdsourcing;
D O I
暂无
中图分类号
H0 [语言学];
学科分类号
030303 ; 0501 ; 050102 ;
摘要
We present two new NER datasets for Twitter; a manually annotated set of 1,467 tweets (kappa = 0.942) and a set of 2,975 expert-corrected, crowdsourced NER annotated tweets from the dataset described in Finin et al. (2010). In our experiments with these datasets, we observe two important points: (a) language drift on Twitter is significant, and while off-the-shelf systems have been reported to perform well on in-sample data, they often perform poorly on new samples of tweets, (b) state-of-the-art performance across various datasets can be obtained from crowdsourced annotations, making it more feasible to "catch up" with language drift.
引用
收藏
页码:2544 / 2547
页数:4
相关论文
共 40 条
  • [1] EmpaTweet: Annotating and Detecting Emotions on Twitter
    Roberts, Kirk
    Roach, Michael A.
    Johnson, Joseph
    Guthrie, Josh
    Harabagiu, Sanda M.
    [J]. LREC 2012 - EIGHTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2012, : 3806 - 3813
  • [2] Annotating Relations Between Named Entities with Crowdsourcing
    Collovini, Sandra
    Pereira, Bolivar
    dos Santos, Henrique D. P.
    Vieira, Renata
    [J]. NATURAL LANGUAGE PROCESSING AND INFORMATION SYSTEMS (NLDB 2018), 2018, 10859 : 290 - 297
  • [3] Annotating Temporal Dependency Graphs via Crowdsourcing
    Yao, Jiarui
    Qiu, Haoling
    Min, Bonan
    Xue, Nianwen
    [J]. PROCEEDINGS OF THE 2020 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP), 2020, : 5368 - 5380
  • [4] AscentX: A System for Crowdsourcing and Annotating with Active Supervision
    Li, Yanchao
    Li, Hao
    [J]. JOURNAL OF CIRCUITS SYSTEMS AND COMPUTERS, 2021, 30 (13)
  • [5] Crowdsourcing Dialect Characterization through Twitter
    Goncalves, Bruno
    Sanchez, David
    [J]. PLOS ONE, 2014, 9 (11):
  • [6] Annotating and Analyzing Biased Sentences in News Articles using Crowdsourcing
    Lim, Sora
    Jatowt, Adam
    Farber, Michael
    Yoshikawa, Masatoshi
    [J]. PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2020), 2020, : 1478 - 1484
  • [7] The Drift of #MyBodyMyChoice Discourse on Twitter
    Menghini, Cristina
    Uhr, Justin
    Haddadan, Shahrzad
    Champagne, Ashley
    Sandstede, Bjorn
    Ramachandran, Sohini
    [J]. PROCEEDINGS OF THE 14TH ACM WEB SCIENCE CONFERENCE, WEBSCI 2022, 2022, : 110 - 117
  • [8] Trollslayer: Crowdsourcing and Characterization of Abusive Birds in Twitter
    Garcia-Recuero, Alvaro
    Morawin, Aneta
    Tyson, Gareth
    [J]. 2018 FIFTH INTERNATIONAL CONFERENCE ON SOCIAL NETWORKS ANALYSIS, MANAGEMENT AND SECURITY (SNAMS), 2018, : 64 - 71
  • [9] Towards Hybrid NER: A Study of Content and Crowdsourcing-Related Performance Factors
    Feyisetan, Oluwaseyi
    Luczak-Roesch, Markus
    Simperl, Elena
    Tinati, Ramine
    Shadbolt, Nigel
    [J]. SEMANTIC WEB: LATEST ADVANCES AND NEW DOMAINS, ESWC 2015, 2015, 9088 : 525 - 540
  • [10] Following the crowd: patterns of crowdsourcing on Twitter among urologists
    Koo, Kevin
    Shee, Kevin
    Gormley, E. Ann
    [J]. WORLD JOURNAL OF UROLOGY, 2019, 37 (03) : 567 - 572