On the Importance of Word Embedding in Automated Harmful Information Detection

被引:1
|
作者
Mohtaj, Salar [1 ,2 ]
Moeller, Sebastian [1 ,2 ]
机构
[1] Tech Univ Berlin, Berlin, Germany
[2] German Res Ctr Artificial Intelligence DFKI, Lab Berlin, Germany
来源
关键词
Fake news detection; Hate speech detection; Word embedding; Contextual word embedding; HATE SPEECH; IDENTIFICATION;
D O I
10.1007/978-3-031-16270-1_21
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Social media have been growing rapidly during past years. They changed different aspects of human life, especially how people communicate and also how people access information. However, along with the important benefits, social media causes a number of significant challenges since they were introduced. Spreading of fake news and hate speech are among the most challenging issues which have attracted a lot of attention by researchers in past years. Different models based on natural language processing are developed to combat these phenomena and stop them in the early stages before mass spreading. Considering the difficulty of the task of automated harmful information detection (i.e., fake news and hate speech detection), every single step of the detection process could have a sensible impact on the performance of models. In this paper, we study the importance of word embedding on the overall performance of deep neural network architecture on the detection of fake news and hate speech on social media. We test various approaches for converting raw input text into vectors, from random weighting to state-of-the-art contextual word embedding models. In addition, to compare different word embedding approaches, we also analyze different strategies to get the vectors from contextual word embedding models (i.e., get the weights from the last layer, against averaging weights of the last layers). Our results show that XLNet embedding outperforms the other embedding approaches on both tasks related to harmful information identification.
引用
收藏
页码:251 / 262
页数:12
相关论文
共 50 条
  • [1] Automated Patent Classification Using Word Embedding
    Grawe, Mattyws F.
    Martins, Claudia A.
    Bonfante, Andreia G.
    2017 16TH IEEE INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS (ICMLA), 2017, : 408 - 411
  • [2] Ontology-Based Enhanced Word Embedding for Automated Information Extraction from Geoscience Reports
    Qiu, Qinjun
    Xie, Zhong
    2018 26TH INTERNATIONAL CONFERENCE ON GEOINFORMATICS (GEOINFORMATICS 2018), 2018,
  • [3] Automated ICD Coding Based on Word Embedding with Entry Embedding and Attention Mechanism
    Zhang H.
    Fu Z.
    Ren Q.
    Xu H.
    Zhao D.
    Yan R.
    Beijing Daxue Xuebao (Ziran Kexue Ban)/Acta Scientiarum Naturalium Universitatis Pekinensis, 2020, 56 (01): : 1 - 8
  • [4] Applying Social Network Embedding and Word Embedding for Socialbots Detection
    Ting, I-Hsien
    Minetaki, Kazunori
    Hsu, Mei-Yun
    Yen, Chia-Sung
    PROCEEDINGS OF THE 2023 IEEE/ACM INTERNATIONAL CONFERENCE ON ADVANCES IN SOCIAL NETWORKS ANALYSIS AND MINING, ASONAM 2023, 2023, : 712 - 718
  • [5] Norm of Word Embedding Encodes Information Gain
    Oyama, Momose
    Yokoi, Sho
    Shimodaira, Hidetoshi
    2023 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING, EMNLP 2023, 2023, : 2108 - 2130
  • [6] Enhancing Information Retrieval with Adapted Word Embedding
    Rekabsaz, Navid
    SIGIR'16: PROCEEDINGS OF THE 39TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 2016, : 1169 - 1169
  • [7] Event Detection on Literature by Utilizing Word Embedding
    Chun, Jiyun
    Kim, Chulyun
    DATABASE SYSTEMS FOR ADVANCED APPLICATIONS, DASFAA 2020, 2020, 12115 : 258 - 266
  • [8] Clickbait Detection Based on Word Embedding Models
    Vorakitphan, Vorakit
    Leu, Fang-Yie
    Fan, Yao-Chung
    INNOVATIVE MOBILE AND INTERNET SERVICES IN UBIQUITOUS COMPUTING, IMIS-2018, 2019, 773 : 557 - 564
  • [9] CES2Vec: A Confidentiality-Oriented Word Embedding for Confidential Information Detection
    Jiang, Jianguo
    Lu, Yue
    Yu, Min
    Li, Gang
    Liu, Chao
    An, Shaohua
    Huang, Weiqing
    2020 IEEE SYMPOSIUM ON COMPUTERS AND COMMUNICATIONS (ISCC), 2020, : 289 - 295
  • [10] An Automated Word Embedding with Parameter Tuned Model for Web Crawling
    Neelakandan, S.
    Arun, A.
    Bhukya, Raghu Ram
    Hardas, Bhalchandra M.
    Kumar, T. Ch Anil
    Ashok, M.
    INTELLIGENT AUTOMATION AND SOFT COMPUTING, 2022, 32 (03): : 1617 - 1632