Large-Scale Machine Learning for Business Sector Prediction

被引:4
|
作者
Angenent, Mitch N. [1 ]
Barata, Antonio Pereira [1 ]
Takes, Frank W. [1 ]
机构
[1] Leiden Univ, Leiden Inst Adv Comp Sci, Leiden, Netherlands
关键词
business sector prediction; explainable machine learning; financial statements; data mining; FRAUD; SELECTION; MODEL;
D O I
10.1145/3341105.3374084
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this study we use machine learning to perform explainable business sector prediction from financial statements. Financial statements are a valuable source of information on the financial state and performance of firms. Recently, large-scale data on financial statements has become available in the form of open data sets. Previous work on such data mainly focused on predicting fraud and bankruptcy. In this paper we devise a model for business sector prediction, which has several valuable applications, including automated error and fraud detection. In addition, such a predictive model may help in completing similar datasets with missing sector information. The proposed method employs a supervised learning approach based on random forests that addresses business sector prediction as a classification task. Using a dataset from the Netherlands Chamber of Commerce, containing over 1.5 million financial statements from Dutch companies, we created an adequately-performing model for business sector prediction. By assessing which features are instrumental in the final classification model, we found that a small number of attributes is crucial for predicting the majority of business sectors. Interestingly, in some cases the presence or absence of a feature was more important than the value itself. The resulting insights may also prove useful in accounting, where the relation between financial statements and characteristics of the company is a frequently studied topic.
引用
收藏
页码:1143 / 1146
页数:4
相关论文
共 50 条
  • [1] Conformal Prediction in Spark: Large-Scale Machine Learning with Confidence
    Capuccini, Marco
    Carlsson, Lars
    Norinder, Ulf
    Spjuth, Ola
    [J]. 2015 IEEE/ACM 2ND INTERNATIONAL SYMPOSIUM ON BIG DATA COMPUTING (BDC), 2015, : 61 - 67
  • [2] A Machine-Learning Approach for Communication Prediction of Large-Scale Applications
    Papadopoulou, Nikela
    Goumas, Georgios
    Koziris, Nectarios
    [J]. 2015 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING - CLUSTER 2015, 2015, : 120 - 123
  • [3] A Survey on Large-Scale Machine Learning
    Wang, Meng
    Fu, Weijie
    He, Xiangnan
    Hao, Shijie
    Wu, Xindong
    [J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2022, 34 (06) : 2574 - 2594
  • [4] Large-scale comparison of machine learning algorithms for target prediction of natural products
    Liang, Lu
    Liu, Ye
    Kang, Bo
    Wang, Ru
    Sun, Meng-Yu
    Wu, Qi
    Meng, Xiang-Fei
    Lin, Jian-Ping
    [J]. BRIEFINGS IN BIOINFORMATICS, 2022, 23 (05)
  • [5] Large-scale comparison of machine learning methods for profiling prediction of kinase inhibitors
    Jiangxia Wu
    Yihao Chen
    Jingxing Wu
    Duancheng Zhao
    Jindi Huang
    MuJie Lin
    Ling Wang
    [J]. Journal of Cheminformatics, 16
  • [6] Large-scale comparison of machine learning methods for drug target prediction on ChEMBL
    Mayr, Andreas
    Klambauer, Guenter
    Unterthiner, Thomas
    Steijaert, Marvin
    Wegner, Jorg K.
    Ceulemans, Hugo
    Clevert, Djork-Arne
    Hochreiter, Sepp
    [J]. CHEMICAL SCIENCE, 2018, 9 (24) : 5441 - 5451
  • [7] Large-scale comparison of machine learning methods for profiling prediction of kinase inhibitors
    Wu, Jiangxia
    Chen, Yihao
    Wu, Jingxing
    Zhao, Duancheng
    Huang, Jindi
    Lin, Mujie
    Wang, Ling
    [J]. JOURNAL OF CHEMINFORMATICS, 2024, 16 (01)
  • [8] Machine learning for prediction of business company failure in hospitality sector
    Brito, Jose Henrique
    Pereira, Jose Manuel
    da Silva, Amelia Ferreira
    Angelico, Maria Jose
    Abreu, Antonio
    Teixeira, Sandrina
    [J]. ADVANCES IN TOURISM, TECHNOLOGY AND SMART SYSTEMS, 2020, 171 : 307 - 317
  • [9] Efficient Machine Learning On Large-Scale Graphs
    Erickson, Parker
    Lee, Victor E.
    Shi, Feng
    Tang, Jiliang
    [J]. PROCEEDINGS OF THE 28TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, KDD 2022, 2022, : 4788 - 4789
  • [10] Machine learning for large-scale MOF screening
    Coupry, Damien
    Groot, Laurens
    Addicoat, Matthew
    Heine, Thomas
    [J]. ABSTRACTS OF PAPERS OF THE AMERICAN CHEMICAL SOCIETY, 2017, 253