NaijaSenti: A Nigerian Twitter Sentiment Corpus for Multilingual Sentiment Analysis

被引:0
|
作者
Muhammad, Shamsuddeen Hassan [1 ,2 ]
Adelani, David Ifeoluwa [3 ,10 ]
Ruder, Sebastian [4 ]
Ahmad, Ibrahim Sa'id [5 ]
Abdulmumin, Idris [6 ]
Bello, Bello Shehu [5 ]
Choudhury, Monojit [7 ]
Emezue, Chris Chinenye [8 ]
Abdullahi, Saheed Salahudeen [9 ,11 ]
Aremu, Anuoluwapo
Jorge, Alipio [2 ]
Brazdil, Pavel [1 ]
机构
[1] LIAAD INESC TEC, Porto, Portugal
[2] Univ Porto, Fac Sci, Porto, Portugal
[3] Saarland Univ, Spoken Language Syst Grp LSV, Saarbrucken, Germany
[4] Google Res, Mountain View, CA USA
[5] Bayero Univ, Fac Comp Sci & Informat Technol, Kano, Nigeria
[6] Ahmadu Bello Univ, Dept Comp Sci, Zaria, Nigeria
[7] Microsoft Res India, Bengaluru, India
[8] Tech Univ Munich, Munich, Germany
[9] Kaduna State Univ, Nasarawa, Nigeria
[10] Masakhane NLP, Johannesburg, South Africa
[11] HausaNLP, Kano, Nigeria
关键词
sentiment analysis; low-resource; twitter corpus; natural language processing;
D O I
暂无
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Sentiment analysis is one of the most widely studied applications in NLP, but most work focuses on languages with large amounts of data. We introduce the first large-scale human-annotated Twitter sentiment dataset for the four most widely spoken languages in Nigeria-Hausa, Igbo, Nigerian-Pidgin, and Yoruba-consisting of around 30,000 annotated tweets per language, including a significant fraction of code-mixed tweets. We propose text collection, filtering, processing, and labeling methods that enable us to create datasets for these low-resource languages. We evaluate a range of pre-trained models and transfer strategies on the dataset. We find that language-specific models and language-adaptive fine-tuning generally perform best. We release the datasets, trained models, sentiment lexicons, and code to incentivize research on sentiment analysis in under-represented languages.
引用
收藏
页码:590 / 602
页数:13
相关论文
共 50 条
  • [1] Twitter as a Corpus for Sentiment Analysis and Opinion Mining
    Pak, Alexander
    Paroubek, Patrick
    [J]. LREC 2010 - SEVENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2010,
  • [2] An Arabic Twitter Corpus for Subjectivity and Sentiment Analysis
    Refaee, Eshrag
    Rieser, Verena
    [J]. LREC 2014 - NINTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2014, : 2268 - 2273
  • [3] A Saudi Dialect Twitter Corpus for Sentiment and Emotion Analysis
    Al-Thubaity, Abdulmohsen
    Alharbi, Mohammed
    Alqahtani, Saif
    Aljandal, Abdulrahman
    [J]. 2018 21ST SAUDI COMPUTER SOCIETY NATIONAL COMPUTER CONFERENCE (NCC), 2018,
  • [4] PotTS: The Potsdam Twitter Sentiment Corpus
    Sidarenka, Uladzimir
    [J]. LREC 2016 - TENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2016, : 1133 - 1141
  • [5] Sentiment Analysis of Twitter Corpus Related to Artificial Intelligence Assistants
    Park, Chae Won
    Seo, Dae Ryong
    [J]. 2018 5TH INTERNATIONAL CONFERENCE ON INDUSTRIAL ENGINEERING AND APPLICATIONS (ICIEA), 2018, : 495 - 498
  • [6] Sentiment analysis on twitter
    Department of Computer Engineering, Delhi Technological University Delhi, India
    [J]. Int. J. Comput. Sci. Issues, 2012, 4 4-3 (372-378):
  • [7] Sentiment Analysis on Twitter
    Meral, Meric
    Diri, Banu
    [J]. 2014 22ND SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE (SIU), 2014, : 690 - 693
  • [8] Sentiment analysis in Twitter
    Martinez-Camara, Eugenio
    Teresa Martin-Valdivia, M.
    Alfonso Urena-Lopez, L.
    Montejo-Raez, Arturo
    [J]. NATURAL LANGUAGE ENGINEERING, 2014, 20 (01) : 1 - 28
  • [9] Sentiment analysis with Twitter
    Akgul, Eyup Sercan
    Ertano, Caner
    Diri, Banu
    [J]. PAMUKKALE UNIVERSITY JOURNAL OF ENGINEERING SCIENCES-PAMUKKALE UNIVERSITESI MUHENDISLIK BILIMLERI DERGISI, 2016, 22 (02): : 106 - 110
  • [10] Twitter Sentiment Analysis
    Sarlan, Aliza
    Nadam, Chayanit
    Basri, Shuib
    [J]. PROCEEDINGS OF THE 2014 6TH INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY AND MULTIMEDIA (ICIM), 2014, : 212 - 216