Using of n-grams from morphological tags for fake news classification

被引:8
|
作者
Kapusta, Jozef [1 ]
Drlik, Martin [1 ]
Munk, Michal [1 ,2 ]
机构
[1] Constantine Philosopher Univ Nitra, Dept Informat, Nitra, Slovakia
[2] Univ Pardubice, Sci & Res Ctr, Pardubice, Czech Republic
关键词
Fake news identification; Text mining; Natural language processing; POS tagging; Morphological analysis;
D O I
10.7717/peerj-cs.624
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Research of the techniques for effective fake news detection has become very needed and attractive. These techniques have a background in many research disciplines, including morphological analysis. Several researchers stated that simple content related n-grams and POS tagging had been proven insufficient for fake news classification. However, they did not realise any empirical research results, which could confirm these statements experimentally in the last decade. Considering this contradiction, the main aim of the paper is to experimentally evaluate the potential of the common use of n-grams and POS tags for the correct classification of fake and true news. The dataset of published fake or real news about the current Covid-19 pandemic was pre-processed using morphological analysis. As a result, n-grams of POS tags were prepared and further analysed. Three techniques based on POS tags were proposed and applied to different groups of n-grams in the pre-processing phase of fake news detection. The n-gram size was examined as the first. Subsequently, the most suitable depth of the decision trees for sufficient generalization was scoped. Finally, the performance measures of models based on the proposed techniques were compared with the standardised reference TF-IDF technique. The performance measures of the model like accuracy, precision, recall and f1-score are considered, together with the 10-fold cross-validation technique. Simultaneously, the question, whether the TF-IDF technique can be improved using POS tags was researched in detail. The results showed that the newly proposed techniques are comparable with the traditional TF-IDF technique. At the same time, it can be stated that the morphological analysis can improve the baseline TF-IDF technique. As a result, the performance measures of the model, precision for fake news and recall for real news, were statistically significantly improved.
引用
收藏
页数:27
相关论文
共 50 条
  • [1] Fake News detection using n-grams for PAN@CLEF competition
    Damian, Sergio
    Calvo, Hiram
    Gelbukh, Alexander
    [J]. JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2022, 42 (05) : 4633 - 4640
  • [2] Fake News Identification: A Comparison of Parts-of-Speech and N-grams with Neural Networks
    Stoick, Brandon
    Snell, Nicholas
    Straub, Jeremy
    [J]. BIG DATA: LEARNING, ANALYTICS, AND APPLICATIONS, 2019, 10989
  • [3] We Will Know Them by Their Style: Fake News Detection Based on Masked N-Grams
    Perez-Santiago, Jennifer
    Villasenor-Pineda, Luis
    Montes-y-Gomez, Manuel
    [J]. PATTERN RECOGNITION, MCPR 2022, 2022, 13264 : 245 - 254
  • [4] Automatic Genre Classification via N-grams of Part-of-Speech Tags
    Tang, Xiaoyan
    Cao, Jing
    [J]. CURRENT WORK IN CORPUS LINGUISTICS: WORKING WITH TRADITIONALLY- CONCEIVED CORPORA AND BEYOND (CILC2015), 2015, 198 : 474 - 478
  • [5] Protein classification using modified n-grams and skip-grams
    Islam, S. M. Ashiqul
    Heil, Benjamin J.
    Kearney, Christopher Michel
    Baker, Erich J.
    [J]. BIOINFORMATICS, 2018, 34 (09) : 1481 - 1487
  • [6] Texture Image Classification Using Pixel N-grams
    Kulkarni, Pradnya
    Stranieri, Andrew
    Ugon, Julien
    [J]. 2016 IEEE INTERNATIONAL CONFERENCE ON SIGNAL AND IMAGE PROCESSING (ICSIP), 2016, : 137 - 141
  • [7] Classification of Metamorphic Virus Using N-Grams Signatures
    Hamid, Isredza Rahmi A.
    Sani, Nur Sakinah Md
    Abdullah, Zubaile
    Foozy, Cik Feresa Mohd
    Kipli, Kuryati
    [J]. RECENT ADVANCES ON SOFT COMPUTING AND DATA MINING (SCDM 2020), 2020, 978 : 140 - 149
  • [8] Fake news spreaders profiling using N-grams of various types and SHAP-based feature selection
    Balouchzahi, Fazlourrahman
    Sidorov, Grigori
    Shashirekha, Hosahalli Lakshmaiah
    [J]. JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2022, 42 (05) : 4437 - 4448
  • [9] Composer classification using melodic combinatorial n-grams
    Alvarez, Daniel Alejandro Perez
    Gelbukh, Alexander
    Sidorov, Grigori
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2024, 249
  • [10] Sentence Classification Using N-Grams in Urdu Language Text
    Awan, Malik Daler Ali
    Ali, Sikandar
    Samad, Ali
    Iqbal, Nadeem
    Missen, Malik Muhammad Saad
    Ullah, Niamat
    [J]. SCIENTIFIC PROGRAMMING, 2021, 2021