What We Can Learn from Looking at Profanity

被引:0
|
作者
Laboreiro, Gustavo [1 ]
Oliveira, Eugenio [1 ]
机构
[1] Univ Porto, LIACC, Fac Engn, P-4100 Oporto, Portugal
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Profanity is a common occurrence in online text. Recent studies found swearing words in over 7% of English tweets and 9% of Yahoo! Buzz messages. However, efforts in recognizing, understanding and dealing with profanity do not share resources, namely, their dataset, which imposes duplication of effort and non-comparable results. We here present a freely available dataset of 2500 messages from a popular Portuguese sports website. About 20% of the messages had profanity, thus we annotated 726 swear words, 510 of which were obfuscated by the authors. We also identified the most frequent profanities, and what methods, and combination of methods, people used to disguise their cursing.
引用
下载
收藏
页码:108 / 113
页数:6
相关论文
共 50 条