Spotting Fake Reviews via Collective Positive-Unlabeled Learning

被引:90
|
作者
Li, Huayi [1 ]
Chen, Zhiyuan [1 ]
Liu, Bing [1 ]
Wei, Xiaokai [1 ]
Shao, Jidong [2 ]
机构
[1] Univ Illinois, Dept Comp Sci, Chicago, IL 60680 USA
[2] Dianping Inc, Shanghai, Peoples R China
关键词
Spam Detection; Collective PU Learning;
D O I
10.1109/ICDM.2014.47
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Online reviews have become an increasingly important resource for decision making and product designing. But reviews systems are often targeted by opinion spamming. Although fake review detection has been studied by researchers for years using supervised learning, ground truth of large scale datasets is still unavailable and most of existing approaches of supervised learning are based on pseudo fake reviews rather than real fake reviews. Working with Dianping(1), the largest Chinese review hosting site, we present the first reported work on fake review detection in Chinese with filtered reviews from Dianping's fake review detection system. Dianping's algorithm has a very high precision, but the recall is hard to know. This means that all fake reviews detected by the system are almost certainly fake but the remaining reviews (unknown set) may not be all genuine. Since the unknown set may contain many fake reviews, it is more appropriate to treat it as an unlabeled set. This calls for the model of learning from positive and unlabeled examples (PU learning). By leveraging the intricate dependencies among reviews, users and IP addresses, we first propose a collective classification algorithm called Multi-typed Heterogeneous Collective Classification (MHCC) and then extend it to Collective Positive and Unlabeled learning (CPU). Our experiments are conducted on real-life reviews of 500 restaurants in Shanghai, China. Results show that our proposed models can markedly improve the F1 scores of strong baselines in both PU and non-PU learning settings. Since our models only use language independent features, they can be easily generalized to other languages.
引用
收藏
页码:899 / 904
页数:6
相关论文
共 50 条
  • [31] Positive-Unlabeled Learning using Random Forests via Recursive Greedy Risk Minimization
    Wilton, Jonathan
    Koay, Abigail M. Y.
    Ko, Ryan K. L.
    Xu, Miao
    Ye, Nan
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35, NEURIPS 2022, 2022,
  • [32] Positive-Unlabeled Domain Adaptation
    Sonntag, Jonas
    Behrens, Gunnar
    Schmidt-Thieme, Lars
    2022 IEEE 9TH INTERNATIONAL CONFERENCE ON DATA SCIENCE AND ADVANCED ANALYTICS (DSAA), 2022, : 66 - 75
  • [33] Principled analytic classifier for positive-unlabeled learning via weighted integral probability metric
    Yongchan Kwon
    Wonyoung Kim
    Masashi Sugiyama
    Myunghee Cho Paik
    Machine Learning, 2020, 109 : 513 - 532
  • [34] Principled analytic classifier for positive-unlabeled learning via weighted integral probability metric
    Kwon, Yongchan
    Kim, Wonyoung
    Sugiyama, Masashi
    Paik, Myunghee Cho
    MACHINE LEARNING, 2020, 109 (03) : 513 - 532
  • [35] PUe: Biased Positive-Unlabeled Learning Enhancement by Causal Inference
    Wang, Xutao
    Chen, Hanting
    Guo, Tianyu
    Wang, Yunhe
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [36] PULNS: Positive-Unlabeled Learning with Effective Negative Sample Selector
    Luo, Chuan
    Zhao, Pu
    Chen, Chen
    Qiao, Bo
    Du, Chao
    Zhang, Hongyu
    Wu, Wei
    Cai, Shaowei
    He, Bing
    Rajmohan, Saravanakumar
    Lin, Qingwei
    THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 8784 - 8792
  • [37] Screening drug-target interactions with positive-unlabeled learning
    Lihong Peng
    Wen Zhu
    Bo Liao
    Yu Duan
    Min Chen
    Yi Chen
    Jialiang Yang
    Scientific Reports, 7
  • [38] A Positive-Unlabeled Learning Algorithm for Urban Flood Susceptibility Modeling
    Li, Wenkai
    Liu, Yuanchi
    Liu, Ziyue
    Gao, Zhen
    Huang, Huabing
    Huang, Weijun
    LAND, 2022, 11 (11)
  • [39] Positive-unlabeled learning in bioinformatics and computational biology: a brief review
    Li, Fuyi
    Dong, Shuangyu
    Leier, Andre
    Han, Meiya
    Guo, Xudong
    Xu, Jing
    Wang, Xiaoyu
    Pan, Shirui
    Jia, Cangzhi
    Zhang, Yang
    Webb, Geoffrey, I
    Coin, Lachlan J. M.
    Li, Chen
    Song, Jiangning
    BRIEFINGS IN BIOINFORMATICS, 2022, 23 (01)
  • [40] Deep Generative Positive-Unlabeled Learning under Selection Bias
    Na, Byeonghu
    Kim, Hyemi
    Song, Kyungwoo
    Joo, Weonyoung
    Kim, Yoon-Yeong
    Moon, Il-Chul
    CIKM '20: PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON INFORMATION & KNOWLEDGE MANAGEMENT, 2020, : 1155 - 1164