Think Outside the Dataset: Finding Fraudulent Reviews using Cross-Dataset Analysis

被引:13
|
作者
Nilizadeh, Shirin [1 ]
Aghakhani, Hojjat [2 ]
Gustafson, Eric [2 ]
Kruegel, Christopher [2 ]
Vigna, Giovanni [2 ]
机构
[1] Univ Texas Arlington, Arlington, TX 76019 USA
[2] Univ Calif Santa Barbara, Santa Barbara, CA 93106 USA
关键词
Review Websites; Fraudulent Reviews and Campaigns; Cross-Dataset Analysis; Change-Point Analysis;
D O I
10.1145/3308558.3313647
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
While online review services provide a two-way conversation between brands and consumers, malicious actors, including misbehaving businesses, have an equal opportunity to distort the reviews for their own gains. We propose OneReview, a method for locating fraudulent reviews, correlating data from multiple crowd-sourced review sites. Our approach utilizes Change Point Analysis to locate points at which a business' reputation shifts. Inconsistent trends in reviews of the same businesses across multiple websites are used to identify suspicious reviews. We then extract an extensive set of textual and contextual features from these suspicious reviews and employ supervised machine learning to detect fraudulent reviews. We evaluated OneReview on about 805K and 462K reviews from Yelp and TripAdvisor, respectively to identify fraud on Yelp. Supervised machine learning yields excellent results, with 97% accuracy. We applied the created model on suspicious reviews and detected about 62K fraudulent reviews (about 8% of all the Yelp reviews). We further analyzed the detected fraudulent reviews and their authors, and located several spam campaigns in the wild, including campaigns against specific businesses, as well as campaigns consisting of several hundreds of socially-networked untrustworthy accounts.
引用
收藏
页码:3108 / 3115
页数:8
相关论文
共 50 条
  • [1] Lightning Talk - Think Outside the Dataset: Finding Fraudulent Reviews using Cross-Dataset Analysis
    Nilizadeh, Shirin
    Aghakhani, Hojjat
    Gustafson, Eric
    Kruegel, Christopher
    Vigna, Giovanni
    [J]. COMPANION OF THE WORLD WIDE WEB CONFERENCE (WWW 2019 ), 2019, : 1288 - 1289
  • [2] A Testbed for Cross-Dataset Analysis
    Tommasi, Tatiana
    Tuytelaars, Tinne
    [J]. COMPUTER VISION - ECCV 2014 WORKSHOPS, PT III, 2015, 8927 : 18 - 31
  • [3] Cross-dataset email classification
    Morales, Valentin
    Gomez, Juan Carlos
    Amerongen, Saskia Van
    [J]. JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2020, 39 (02) : 2279 - 2290
  • [4] Cross-Dataset Action Detection
    Cao, Liangliang
    Liu, Zicheng
    Huang, Thomas S.
    [J]. 2010 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2010, : 1998 - 2005
  • [5] Learning to Generalize Unseen Dataset for Cross-Dataset Palmprint Recognition
    Shao, Huikai
    Zou, Yuchen
    Liu, Chengcheng
    Guo, Qiang
    Zhong, Dexing
    [J]. IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, 2024, 19 : 3788 - 3799
  • [6] Cross-Dataset Face Manipulation Detection
    Bekci, Burak
    Akhtar, Zahid
    Ekenel, Hazim Kemal
    [J]. 2020 28TH SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE (SIU), 2020,
  • [7] Cross-Dataset Learning for Age Estimation
    Zhang, Beichen
    Bao, Yue
    [J]. IEEE ACCESS, 2022, 10 : 24048 - 24055
  • [8] Cross-Dataset Learning of Visual Concepts
    Hentschel, Christian
    Sack, Harald
    Steinmetz, Nadine
    [J]. ADAPTIVE MULTIMEDIA RETRIEVAL: SEMANTICS, CONTEXT, AND ADAPTATION, AMR 2012, 2014, 8382 : 87 - 101
  • [9] Cross-Dataset Facial Expression Recognition
    Yan, Haibin
    Ang, Marcelo H., Jr.
    Poo, Aun Neow
    [J]. 2011 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA), 2011,
  • [10] Cross-Dataset Design Discussion Mining
    Mahadi, Alvi
    Tongay, Karan
    Ernst, Neil A.
    [J]. PROCEEDINGS OF THE 2020 IEEE 27TH INTERNATIONAL CONFERENCE ON SOFTWARE ANALYSIS, EVOLUTION, AND REENGINEERING (SANER '20), 2020, : 149 - 160