Fraud-Agents Detection in Online Microfinance: A Large-Scale Empirical Study

被引:6
|
作者
Wu, Yiming [1 ]
Xie, Zhiyuan [1 ]
Ji, Shouling [1 ]
Liu, Zhenguang [2 ]
Zhang, Xuhong [3 ]
Lin, Changting [4 ]
Deng, Shuiguang [1 ]
Zhou, Jun [5 ]
Wang, Ting [6 ]
Beyah, Raheem [7 ]
机构
[1] Zhejiang Univ, Coll Comp Sci & Technol, Hangzhou 310027, Zhejiang, Peoples R China
[2] Zhejiang Gongshang Univ, Dept Comp Sci, Hangzhou 310018, Zhejiang, Peoples R China
[3] Zhejiang Univ, Coll Control Sci & Engn, Hangzhou 310027, Zhejiang, Peoples R China
[4] Zhejiang Univ, Binjiang Inst, Hangzhou 310027, Zhejiang, Peoples R China
[5] Ant Grp, Hangzhou 310000, Zhejiang, Peoples R China
[6] Penn State Univ, Coll Informat Sci & Technol, State Coll, PA 16801 USA
[7] Georgia Inst Technol, Sch Elect & Comp Engn, Atlanta, GA 30332 USA
关键词
History; Feature extraction; Wireless fidelity; Social networking (online); Systematics; Peer-to-peer computing; Computer science; Fraud-agents detection; online lending; Index Terms; empirical study; CREDIT; RISK;
D O I
10.1109/TDSC.2022.3151132
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Online Microlending, a new financial service, focuses on small loans without any sort of collateral. It provides more flexible and quicker funding for borrowers, as well as higher interest rates of return. For platforms that provide such services, an essential task is to adequately evaluate each loan's risk so as to minimize the possible financial loss. However, there exists a special group of borrowers, namely fraud-agents, who gain illegal profits from inciting other borrowers to cheat, i.e., they help the high-risk borrowers evade the risk evaluation by crafting fake personal information. The existence of fraud-agents poses a severe threat to the risk management systems and results in a huge financial loss for lending platforms. In this article, we present the first machine learning-based solution to detect fraud-agents in online microlending. The key challenge of this decade-long problem is that it is unclear how to construct effective features from multiple behavior logs such as phone call history, address book, loan history and activity logs of borrowers. To address this problem, we first conduct an empirical study on over 600K borrowers to gain some insights on the adversarial behaviors of fraud-agents comparing to normal borrowers and benign-agents. Based on the study, we are able to design a total of 26 features, falling into four groups, for fraud agent detection. Then, we propose a two-stage detection model to address the challenge of limited number of labeled fraud agent examples. The evaluation results show that our method can achieve a precision of 94.30%. We deploy our method on a real large online microlending platform with 11,953,273 borrowers, and we identify 29,727 fraud-agents from them. The domain experts from the platform confirm that 95.59% of them are real fraud-agents, and have added them to the platform's internal blacklist. We further conduct a measurement study on those fraud-agents to share deeper insights on their adversarial behaviors.
引用
收藏
页码:1169 / 1185
页数:17
相关论文
共 50 条
  • [31] A Large-Scale Empirical Study on Code-Comment Inconsistencies
    Wen, Fengcai
    Nagy, Csaba
    Bavota, Gabriele
    Lanza, Michele
    2019 IEEE/ACM 27TH INTERNATIONAL CONFERENCE ON PROGRAM COMPREHENSION (ICPC 2019), 2019, : 53 - 64
  • [32] Building an Online Defect Detection System for Large-scale Photovoltaic Plants
    Li, Xiaoxia
    Li, Wei
    Yang, Qiang
    Yan, Wenjun
    Zomaya, Albert Y.
    BUILDSYS'19: PROCEEDINGS OF THE 6TH ACM INTERNATIONAL CONFERENCE ON SYSTEMS FOR ENERGY-EFFICIENT BUILDINGS, CITIES, AND TRANSPORTATION, 2019, : 253 - 262
  • [33] A Survey of Malicious Accounts Detection in Large-Scale Online Social Networks
    Xin, Yang
    Zhao, Chensu
    Zhu, Hongliang
    Gao, Mingcheng
    2018 IEEE 4TH INTERNATIONAL CONFERENCE ON BIG DATA SECURITY ON CLOUD (BIGDATASECURITY), 4THIEEE INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE AND SMART COMPUTING, (HPSC) AND 3RD IEEE INTERNATIONAL CONFERENCE ON INTELLIGENT DATA AND SECURITY (IDS), 2018, : 155 - 158
  • [34] A Large-scale Empirical Study on Linguistic Antipatterns Affecting APIs
    Aghajani, Emad
    Nagy, Csaba
    Bavota, Gabriele
    Lanza, Michele
    PROCEEDINGS 2018 IEEE INTERNATIONAL CONFERENCE ON SOFTWARE MAINTENANCE AND EVOLUTION (ICSME), 2018, : 25 - 35
  • [35] Understand the Predictability of Wireless Spectrum: A Large-scale Empirical Study
    Song, Chengqi
    Chen, Dawei
    Zhang, Qian
    2010 IEEE INTERNATIONAL CONFERENCE ON COMMUNICATIONS - ICC 2010, 2010,
  • [36] Gaming addiction, definition and measurement: A large-scale empirical study
    Spekman, Marloes L. C.
    Konijn, Elly A.
    Roelofsma, Peter H. M. P.
    Griffiths, Mark D.
    COMPUTERS IN HUMAN BEHAVIOR, 2013, 29 (06) : 2150 - 2155
  • [37] A Large-Scale Empirical Study on Semantic Versioning in Golang Ecosystem
    Li, Wenke
    Wu, Feng
    Fu, Cai
    Zhou, Fan
    Proceedings - 2023 38th IEEE/ACM International Conference on Automated Software Engineering, ASE 2023, 2023, : 1604 - 1614
  • [38] Towards large-scale social networks with online diffusion provenance detection
    Wang, Haishuai
    Wu, Jia
    Pan, Shirui
    Zhang, Peng
    Chen, Ling
    COMPUTER NETWORKS, 2017, 114 : 154 - 166
  • [39] A Large-Scale Empirical Study on Software Reuse in Mobile Apps
    Mojica, Israel J.
    Adams, Bram
    Nagappan, Meiyappan
    Dienst, Steffen
    Berger, Thorsten
    Hassan, Ahmed E.
    IEEE SOFTWARE, 2014, 31 (02) : 78 - 86
  • [40] The Secret Life of Software Vulnerabilities: A Large-Scale Empirical Study
    Iannone, Emanuele
    Guadagni, Roberta
    Ferrucci, Filomena
    De Lucia, Andrea
    Palomba, Fabio
    IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 2023, 49 (01) : 44 - 63