Deceptive text detection using continuous semantic space models

被引:7
|
作者
Hernandez-Castaneda, Angel [1 ]
Calvo, Hiram [1 ]
机构
[1] IPN, CIC, Av JD Batiz E MO de Mendizabal, Mexico City 07738, DF, Mexico
关键词
Deception detection; continuous semantic space model; one-hot representation; linguistic inquiry and word count; syntactic n-grams;
D O I
10.3233/IDA-170882
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We identify deceptive text by using different kinds of features: A continuous semantic space model based on latent Dirichlet allocation topics (LDA), one-hot representation (OHR), syntactic information from syntactic n-grams (SN), and lexicon-based features using the linguistic inquiry and word count dictionary (LIWC). Several combinations of these features were tested to assess the best source(s) for deceptive text identification. By selecting the appropriate features, we were able to obtain a benchmark-level performance using a Naive Bayes classifier. We tested on three different available corpora: A corpus consisting of 800 reviews about hotels, a corpus consisting of 600 reviews about controversial topics, and a corpus consisting of 236 book reviews. We found that the merge of both LDA features and OHR yielded the best results, obtaining accuracy above 80% in all tested datasets. Additionally, this combination of features has the advantage that language-specific-resources are not required (e.g. SN, LIWC), compared to other reference works. Additionally, we present an analysis on which features lead to either deceptive or truthful texts, finding that certain words can play different roles (sometimes even opposing ones) depending on the task being evaluated.
引用
收藏
页码:679 / 695
页数:17
相关论文
共 50 条
  • [1] Semantic Features-Based Discourse Analysis Using Deceptive and Real Text Reviews
    Alawadh, Husam M.
    Alabrah, Amerah
    Meraj, Talha
    Rauf, Hafiz Tayyab
    [J]. INFORMATION, 2023, 14 (01)
  • [2] Deceptive Opinions Detection Using New Proposed Arabic Semantic Features
    Ziani, Amel
    Azizi, Nabiha
    Schwab, Didier
    Zenakhra, Djamel
    Aldwairi, Monther
    Chekkai, Nassira
    Zemmal, Nawel
    Salah, Marwa Hadj
    [J]. AI IN COMPUTATIONAL LINGUISTICS, 2021, 189 : 29 - 36
  • [3] Using Siamese BiLSTM Models for Identifying Text Semantic Similarity
    Fradelos, Georgios
    Perikos, Isidoros
    Hatzilygeroudis, Ioannis
    [J]. ARTIFICIAL INTELLIGENCE APPLICATIONS AND INNOVATIONS. AIAI 2023 IFIP WG 12.5 INTERNATIONAL WORKSHOPS, 2023, 677 : 381 - 392
  • [4] Fake or real? The computational detection of online deceptive text
    Ball L.
    Elworthy J.
    [J]. Journal of Marketing Analytics, 2014, 2 (3) : 187 - 201
  • [5] Using word semantic concepts for plagiarism detection in text documents
    Chang, Chia-Yang
    Lee, Shie-Jue
    Wu, Chih-Hung
    Liu, Chih-Feng
    Liu, Ching-Kuan
    [J]. INFORMATION RETRIEVAL JOURNAL, 2021, 24 (4-5): : 298 - 321
  • [6] Semantic Representation Using Explicit Concept Space Models
    Shalaby, Walid
    Zadrozny, Wlodek
    [J]. THIRTY-FIRST AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2017, : 4983 - 4984
  • [7] An Algorithm for Scene Text Detection Using Multibox and Semantic Segmentation
    Qin, Hongbo
    Zhang, Haodi
    Wang, Hai
    Yan, Yujin
    Zhang, Min
    Zhao, Wei
    [J]. APPLIED SCIENCES-BASEL, 2019, 9 (06):
  • [8] Using word semantic concepts for plagiarism detection in text documents
    Chia-Yang Chang
    Shie-Jue Lee
    Chih-Hung Wu
    Chih-Feng Liu
    Ching-Kuan Liu
    [J]. Information Retrieval Journal, 2021, 24 : 298 - 321
  • [9] Classify Arabic Text using Vector Space Models
    Hanandeh, Essam S.
    abu Awwad, Aref
    Khassawneh, Yazan
    [J]. 2021 22ND INTERNATIONAL ARAB CONFERENCE ON INFORMATION TECHNOLOGY (ACIT), 2021, : 465 - 476
  • [10] Integrating Text Classification into Topic Discovery Using Semantic Embedding Models
    Lezama-Sanchez, Ana Laura
    Vidal, Mireya Tovar
    Reyes-Ortiz, Jose A.
    [J]. APPLIED SCIENCES-BASEL, 2023, 13 (17):