Detecting Singleton Review Spammers Using Semantic Similarity

被引:50
|
作者
Sandulescu, Vlad [1 ,3 ]
Ester, Martin [2 ]
机构
[1] Adform, Copenhagen, Denmark
[2] Simon Fraser Univ, Sch Comp Sci, Burnaby, BC, Canada
[3] Trustpilot, Copenhagen, Denmark
关键词
opinion spam; fake review detection; semantic similarity; aspect-based opinion mining; latent dirichlet allocation;
D O I
10.1145/2740908.2742570
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Online reviews have increasingly become a very important resource for consumers when making purchases. Though it is becoming more and more difficult for people to make well-informed buying decisions without being deceived by fake reviews. Prior works on the opinion spam problem mostly considered classifying fake reviews using behavioral user patterns. They focused on prolific users who write more than a couple of reviews, discarding one-time reviewers. The number of singleton reviewers however is expected to be high for many review websites. While behavioral patterns are effective when dealing with elite users, for one-time reviewers, the review text needs to be exploited. In this paper we tackle the problem of detecting fake reviews written by the same person using multiple names, posting each review under a different name. We propose two methods to detect similar reviews and show the results generally outperform the vectorial similarity measures used in prior works. The first method extends the semantic similarity between words to the reviews level. The second method is based on topic modeling and exploits the similarity of the reviews topic distributions using two models: bag-of-words and bag-of-opinion phrases. The experiments were conducted on reviews from three different datasets: Yelp (57K reviews), Trustpilot (9K reviews) and Ott dataset (800 reviews).
引用
收藏
页码:971 / 976
页数:6
相关论文
共 50 条
  • [21] Deriving similarity for Semantic Web using similarity graph
    JuHum Kwon
    O-Hoon Choi
    Chang-Joo Moon
    Soo-Hyun Park
    Doo-Kwon Baik
    Journal of Intelligent Information Systems, 2006, 26
  • [22] Similarity of Sentences With Contradiction Using Semantic Similarity Measures
    Prasad, M. Krishna Siva
    Sharma, Poonam
    COMPUTER JOURNAL, 2022, 65 (03): : 701 - 717
  • [23] Deriving similarity for Semantic Web using similarity graph
    Kwon, JuHum
    Choi, O-Hoon
    Moon, Chang-Joo
    Park, Soo-Hyun
    Baik, Doo-Kwon
    JOURNAL OF INTELLIGENT INFORMATION SYSTEMS, 2006, 26 (02) : 149 - 166
  • [24] Information Extraction from Spam Emails using Stylistic and Semantic Features to Identify Spammers
    Halder, Soma
    Tiwari, Richa
    Sprague, Alan
    2011 IEEE INTERNATIONAL CONFERENCE ON INFORMATION REUSE AND INTEGRATION (IRI), 2011, : 104 - 107
  • [25] Detecting Crowdsourcing Spammers in Community Question Answering Websites
    Hao, Kaiqing
    Wang, Lei
    ADVANCES IN INTERNETWORKING, DATA & WEB TECHNOLOGIES, EIDWT-2017, 2018, 6 : 412 - 423
  • [26] A Review of Information Content Metric for Semantic Similarity
    Meng, Lingling
    Gu, Junzhong
    Zhou, Zili
    ADVANCES ON DIGITAL TELEVISION AND WIRELESS MULTIMEDIA COMMUNICATIONS, 2012, 331 : 299 - +
  • [27] Semantic Similarity for English and Arabic Texts: A Review
    Alian, Marwah
    Awajan, Arafat
    JOURNAL OF INFORMATION & KNOWLEDGE MANAGEMENT, 2020, 19 (04)
  • [28] A Review on the Determination of Semantic Similarity of Patent Documents
    Kayakoku, Ahmet
    Tufekci, Aslihan
    JOURNAL OF POLYTECHNIC-POLITEKNIK DERGISI, 2024,
  • [29] Feature Extraction Using Semantic Similarity
    Aboelela, Eman M.
    Gad, Walaa
    Ismail, Rasha
    PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON ADVANCED INTELLIGENT SYSTEMS AND INFORMATICS 2019, 2020, 1058 : 82 - 91
  • [30] Probabilistic graphical model for detecting spammers in microblog websites
    Han, Zhongming
    Yang, Ke
    Xu, Fengmin
    Duan, Dagao
    INTERNATIONAL JOURNAL OF EMBEDDED SYSTEMS, 2016, 8 (01) : 12 - 23