Cross-Lingual Classification of Crisis Data

被引:11
|
作者
Khare, Prashant [1 ]
Burel, Gregoire [1 ]
Maynard, Diana [2 ]
Alani, Harith [1 ]
机构
[1] Open Univ, Knowledge Media Inst, Milton Keynes, Bucks, England
[2] Univ Sheffield, Dept Comp Sci, Sheffield, S Yorkshire, England
来源
基金
欧盟地平线“2020”;
关键词
Semantics; Cross-lingual; Multilingual; Crisis informatics; Tweet classification;
D O I
10.1007/978-3-030-00671-6_36
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Many citizens nowadays flock to social media during crises to share or acquire the latest information about the event. Due to the sheer volume of data typically circulated during such events, it is necessary to be able to efficiently filter out irrelevant posts, thus focusing attention on the posts that are truly relevant to the crisis. Current methods for classifying the relevance of posts to a crisis or set of crises typically struggle to deal with posts in different languages, and it is not viable during rapidly evolving crisis situations to train new models for each language. In this paper we test statistical and semantic classification approaches on cross-lingual datasets from 30 crisis events, consisting of posts written mainly in English, Spanish, and Italian. We experiment with scenarios where the model is trained on one language and tested on another, and where the data is translated to a single language. We show that the addition of semantic features extracted from external knowledge bases improve accuracy over a purely statistical model.
引用
收藏
页码:617 / 633
页数:17
相关论文
共 50 条
  • [1] Data Quality Controlling for Cross-Lingual Sentiment Classification
    Li, Shoushan
    Xue, Yunxia
    Wang, Zhongqing
    Lee, Sophia Yat Mei
    Huang, Chu-Ren
    [J]. 2013 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING (IALP 2013), 2013, : 125 - 128
  • [2] Reinforced Transformer with Cross-Lingual Distillation for Cross-Lingual Aspect Sentiment Classification
    Wu, Hanqian
    Wang, Zhike
    Qing, Feng
    Li, Shoushan
    [J]. ELECTRONICS, 2021, 10 (03) : 1 - 14
  • [3] Cross-Lingual Web Spam Classification
    Garzo, Andras
    Daroczy, Balint
    Kiss, Tamas
    Siklosi, David
    Benczur, Andras A.
    [J]. PROCEEDINGS OF THE 22ND INTERNATIONAL CONFERENCE ON WORLD WIDE WEB (WWW'13 COMPANION), 2013, : 1149 - 1156
  • [4] Cross-lingual Distillation for Text Classification
    Xu, Ruochen
    Yang, Yiming
    [J]. PROCEEDINGS OF THE 55TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2017), VOL 1, 2017, : 1415 - 1425
  • [5] A Comparative Study of Cross-Lingual Sentiment Classification
    Wan, Xiaojun
    [J]. 2012 IEEE/WIC/ACM INTERNATIONAL CONFERENCE ON WEB INTELLIGENCE AND INTELLIGENT AGENT TECHNOLOGY (WI-IAT 2012), VOL 1, 2012, : 24 - 31
  • [6] Cross-lingual sentiment classification with stacked autoencoders
    Guangyou Zhou
    Zhiyuan Zhu
    Tingting He
    Xiaohua Tony Hu
    [J]. Knowledge and Information Systems, 2016, 47 : 27 - 44
  • [7] A cross-lingual video classification using subtitles
    El Kah, Anoual
    Zeroual, Imad
    [J]. 2022 2ND INTERNATIONAL CONFERENCE ON INNOVATIVE RESEARCH IN APPLIED SCIENCE, ENGINEERING AND TECHNOLOGY (IRASET'2022), 2022, : 703 - 707
  • [8] Czech Dataset for Cross-lingual Subjectivity Classification
    Priban, Pavel
    Steinberger, Josef
    [J]. LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 1381 - 1391
  • [9] Cross-lingual sentiment classification with stacked autoencoders
    Zhou, Guangyou
    Zhu, Zhiyuan
    He, Tingting
    Hu, Xiaohua Tony
    [J]. KNOWLEDGE AND INFORMATION SYSTEMS, 2016, 47 (01) : 27 - 44
  • [10] Active Learning for Cross-Lingual Sentiment Classification
    Li, Shoushan
    Wang, Rong
    Liu, Huanhuan
    Huang, Chu-Ren
    [J]. NATURAL LANGUAGE PROCESSING AND CHINESE COMPUTING, NLPCC 2013, 2013, 400 : 236 - 246