Towards a Foundation Purchasing Model: Pretrained Generative Autoregression on Transaction Sequences

被引:0
|
作者
Skalski, Piotr [1 ]
Sutton, David [1 ]
Burrell, Stuart [1 ]
Perez, Iker [1 ]
Wong, Jason [1 ]
机构
[1] Featurespace, Innovat Lab, Cambridge, England
关键词
transaction embeddings; self-supervised learning; generative modelling; multivariate time series; fraud detection;
D O I
10.1145/3604237.3626850
中图分类号
F8 [财政、金融];
学科分类号
0202 ;
摘要
Machine learning models underpin many modern financial systems for use cases such as fraud detection and churn prediction. Most are based on supervised learning with hand-engineered features, which relies heavily on the availability of labelled data. Large self-supervised generative models have shown tremendous success in natural language processing and computer vision, yet so far they haven't been adapted to multivariate time series of financial transactions. In this paper, we present a generative pretraining method that can be used to obtain contextualised embeddings of financial transactions. Benchmarks on public datasets demonstrate that it outperforms state-of-the-art self-supervised methods on a range of downstream tasks. We additionally perform large-scale pretraining of an embedding model using a corpus of data from 180 issuing banks containing 5.1 billion transactions and apply it to the card fraud detection problem on hold-out datasets. The embedding model significantly improves value detection rate at high precision thresholds and transfers well to out-of-domain distributions.
引用
收藏
页码:141 / 149
页数:9
相关论文
共 50 条
  • [41] A new deterministic process based generative model for characterizing bursty error sequences
    Wang, CX
    Pätzold, M
    2004 IEEE 15TH INTERNATIONAL SYMPOSIUM ON PERSONAL, INDOOR AND MOBILE RADIO COMMUNICATIONS, VOLS 1-4, PROCEEDINGS, 2004, : 2134 - 2139
  • [42] Online Sound Structure Analysis Based on Generative Model of Acoustic Feature Sequences
    Imoto, Keisuke
    Ono, Nobutaka
    Niitsuma, Masahiro
    Yamashita, Yoichi
    2017 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC 2017), 2017, : 1316 - 1321
  • [43] MetaEarth: A Generative Foundation Model for Global-Scale Remote Sensing Image Generation
    Yu, Zhiping
    Liu, Chenyang
    Liu, Liqin
    Shi, Zhenwei
    Zou, Zhengxia
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2025, 47 (03) : 1764 - 1781
  • [44] Towards artificial general intelligence via a multimodal foundation model
    Nanyi Fei
    Zhiwu Lu
    Yizhao Gao
    Guoxing Yang
    Yuqi Huo
    Jingyuan Wen
    Haoyu Lu
    Ruihua Song
    Xin Gao
    Tao Xiang
    Hao Sun
    Ji-Rong Wen
    Nature Communications, 13
  • [45] Towards a Foundation Model for Geospatial Artificial Intelligence (Vision Paper)
    Mai, Gengchen
    Cundy, Chris
    Choi, Kristy
    Hu, Yingjie
    Lao, Ni
    Ermon, Stefano
    30TH ACM SIGSPATIAL INTERNATIONAL CONFERENCE ON ADVANCES IN GEOGRAPHIC INFORMATION SYSTEMS, ACM SIGSPATIAL GIS 2022, 2022, : 744 - 747
  • [46] Towards artificial general intelligence via a multimodal foundation model
    Fei, Nanyi
    Lu, Zhiwu
    Gao, Yizhao
    Yang, Guoxing
    Huo, Yuqi
    Wen, Jingyuan
    Lu, Haoyu
    Song, Ruihua
    Gao, Xin
    Xiang, Tao
    Sun, Hao
    Wen, Ji-Rong
    NATURE COMMUNICATIONS, 2022, 13 (01)
  • [47] Towards a general-purpose foundation model for computational pathology
    Chen, Richard J.
    Ding, Tong
    Lu, Ming Y.
    Williamson, Drew F. K.
    Jaume, Guillaume
    Song, Andrew H.
    Chen, Bowen
    Zhang, Andrew
    Shao, Daniel
    Shaban, Muhammad
    Williams, Mane
    Oldenburg, Lukas
    Weishaupt, Luca L.
    Wang, Judy J.
    Vaidya, Anurag
    Le, Long Phi
    Gerber, Georg
    Sahai, Sharifa
    Williams, Walt
    Mahmood, Faisal
    NATURE MEDICINE, 2024, 30 (03) : 850 - 862
  • [48] Towards the Use of Pretrained Language Model GPT-2 for Testing the Hypothesis of Communicative Efficiency in the Lexicon
    Zhang, Yuqing
    Li, Zhu
    Zhang, Jinsong
    2021 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING (IALP), 2021, : 62 - 66
  • [49] Towards a Learning-Algorithm Agnostic Generative Policy Model for Coalitions
    Cunnington, Daniel
    Law, Mark
    de Mel, Geeth
    Manotas, Irene
    Bertino, Elisa
    Calo, Seraphin
    Verma, Darshika
    ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING FOR MULTI-DOMAIN OPERATIONS APPLICATIONS, 2019, 11006
  • [50] Learning a Generative Motion Model From Image Sequences Based on a Latent Motion Matrix
    Krebs, Julian
    Delingette, Herve
    Ayache, Nicholas
    Mansi, Tommaso
    IEEE TRANSACTIONS ON MEDICAL IMAGING, 2021, 40 (05) : 1405 - 1416