Identifying financial statement fraud with decision rules obtained from Modified Random Forest

被引:23
|
作者
An, Byungdae [1 ]
Suh, Yongmoo [1 ]
机构
[1] Korea Univ, MIS, Sch Business, Seoul, South Korea
关键词
Financial statement fraud; Random forest; Decision rules; Feature importance; Machine learning; Predictive model; DATA MINING TECHNIQUES; INFORMATION ASYMMETRY; CORPORATE GOVERNANCE; MANAGEMENT; TREE; CLASSIFICATION; COMPLEXITY;
D O I
10.1108/DTA-11-2019-0208
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Purpose Financial statement fraud (FSF) committed by companies implies the current status of the companies may not be healthy. As such, it is important to detect FSF, since such companies tend to conceal bad information, which causes a great loss to various stakeholders. Thus, the objective of the paper is to propose a novel approach to building a classification model to identify FSF, which shows high classification performance and from which human-readable rules are extracted to explain why a company is likely to commit FSF. Design/methodology/approach Having prepared multiple sub-datasets to cope with class imbalance problem, we build a set of decision trees for each sub-dataset; select a subset of the set as a model for the sub-dataset by removing the tree, each of whose performance is less than the average accuracy of all trees in the set; and then select one such model which shows the best accuracy among the models. We call the resulting model MRF (Modified Random Forest). Given a new instance, we extract rules from the MRF model to explain whether the company corresponding to the new instance is likely to commit FSF or not. Findings Experimental results show that MRF classifier outperformed the benchmark models. The results also revealed that all the variables related to profit belong to the set of the most important indicators to FSF and that two new variables related to gross profit which were unapprised in previous studies on FSF were identified. Originality/value This study proposed a method of building a classification model which shows the outstanding performance and provides decision rules that can be used to explain the classification results. In addition, a new way to resolve the class imbalance problem was suggested in this paper.
引用
收藏
页码:235 / 255
页数:21
相关论文
共 50 条
  • [31] Mapping Small-Scale Irrigation Areas Using Expert Decision Rules and the Random Forest Classifier in Northern Ethiopia
    Mohammedshum, Amina Abdelkadir
    Maathuis, Ben H. P.
    Mannaerts, Chris M.
    Teka, Daniel
    REMOTE SENSING, 2023, 15 (24)
  • [32] A decision framework for identifying models to estimate forest ecosystem services gains from restoration
    Zachary L.Christin
    Kenneth J.Bagstad
    Michael A.Verdone
    Forest Ecosystems, 2016, 3 (02) : 126 - 137
  • [33] A decision framework for identifying models to estimate forest ecosystem services gains from restoration
    Christin, Zachary L.
    Bagstad, Kenneth J.
    Verdone, Michael A.
    FOREST ECOSYSTEMS, 2016, 3
  • [34] Identifying Risk Factors for Premature Birth in the UK Millennium Cohort Using a Random Forest Decision-Tree Approach
    Waynforth, David
    REPRODUCTIVE MEDICINE, 2022, 3 (04): : 320 - 333
  • [35] Financial Statement according to National or International Financial Reporting Standards? A Decision Analysis Case Study from the Czech Republic at Industrial Companies
    Krajnak, Michal
    INZINERINE EKONOMIKA-ENGINEERING ECONOMICS, 2020, 31 (03): : 270 - 281
  • [36] Learning Decision Forest from Evidential Data: the Random Training Set Sampling Approach
    Ma, Liyao
    Sun, Bin
    Han, Chunyan
    2017 4TH INTERNATIONAL CONFERENCE ON SYSTEMS AND INFORMATICS (ICSAI), 2017, : 1423 - 1428
  • [37] Board structure and the likelihood of financial statement fraud. Does audit fee matter? Evidence from manufacturing firms in the East Africa community
    Kaituko, Lucas Ekiru
    Githaiga, Peter Nderitu
    Chelogoi, Stephen Kimutai
    COGENT BUSINESS & MANAGEMENT, 2023, 10 (02):
  • [38] Exploring Sarbanes-Oxley's effect on attitudes, perceptions of norms, and intentions to commit financial statement fraud from a general deterrence perspective
    Ugrin, Joseph C.
    Odom, Marcus D.
    JOURNAL OF ACCOUNTING AND PUBLIC POLICY, 2010, 29 (05) : 439 - 458
  • [39] Modified K-Neighbor Outperforms Logistic Regression and Random Forest in Identifying Host Malware Across Limited Data Sets
    Rai, Manish Kumar
    Haripriya, K.
    Sharma, Priyanka
    ADVANCED NETWORK TECHNOLOGIES AND INTELLIGENT COMPUTING, ANTIC 2022, PT I, 2023, 1797 : 108 - 124
  • [40] Identifying SH-IoT devices from network traffic characteristics using random forest classifier
    Chowdhury, Rajarshi Roy
    Idris, Azam Che
    Abas, Pg Emeroylariffion
    WIRELESS NETWORKS, 2023, 30 (1) : 405 - 419