Identifying financial statement fraud with decision rules obtained from Modified Random Forest

被引:23
|
作者
An, Byungdae [1 ]
Suh, Yongmoo [1 ]
机构
[1] Korea Univ, MIS, Sch Business, Seoul, South Korea
关键词
Financial statement fraud; Random forest; Decision rules; Feature importance; Machine learning; Predictive model; DATA MINING TECHNIQUES; INFORMATION ASYMMETRY; CORPORATE GOVERNANCE; MANAGEMENT; TREE; CLASSIFICATION; COMPLEXITY;
D O I
10.1108/DTA-11-2019-0208
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Purpose Financial statement fraud (FSF) committed by companies implies the current status of the companies may not be healthy. As such, it is important to detect FSF, since such companies tend to conceal bad information, which causes a great loss to various stakeholders. Thus, the objective of the paper is to propose a novel approach to building a classification model to identify FSF, which shows high classification performance and from which human-readable rules are extracted to explain why a company is likely to commit FSF. Design/methodology/approach Having prepared multiple sub-datasets to cope with class imbalance problem, we build a set of decision trees for each sub-dataset; select a subset of the set as a model for the sub-dataset by removing the tree, each of whose performance is less than the average accuracy of all trees in the set; and then select one such model which shows the best accuracy among the models. We call the resulting model MRF (Modified Random Forest). Given a new instance, we extract rules from the MRF model to explain whether the company corresponding to the new instance is likely to commit FSF or not. Findings Experimental results show that MRF classifier outperformed the benchmark models. The results also revealed that all the variables related to profit belong to the set of the most important indicators to FSF and that two new variables related to gross profit which were unapprised in previous studies on FSF were identified. Originality/value This study proposed a method of building a classification model which shows the outstanding performance and provides decision rules that can be used to explain the classification results. In addition, a new way to resolve the class imbalance problem was suggested in this paper.
引用
收藏
页码:235 / 255
页数:21
相关论文
共 50 条
  • [41] Identifying SH-IoT devices from network traffic characteristics using random forest classifier
    Rajarshi Roy Chowdhury
    Azam Che Idris
    Pg Emeroylariffion Abas
    Wireless Networks, 2024, 30 : 405 - 419
  • [42] Identifying channel sand-body from multiple seismic attributes with an improved random forest algorithm
    Ao, Yile
    Li, Hongqi
    Zhu, Liping
    Ali, Sikandar
    Yang, Zhongguo
    JOURNAL OF PETROLEUM SCIENCE AND ENGINEERING, 2019, 173 : 781 - 792
  • [43] COMBINING RANDOM FOREST AND COPULA FUNCTIONS: A HEURISTIC APPROACH FOR SELECTING ASSETS FROM A FINANCIAL CRISIS PERSPECTIVE
    De Luca, Giovanni
    Rivieccio, Giorgia
    Zuccolotto, Paola
    INTELLIGENT SYSTEMS IN ACCOUNTING FINANCE & MANAGEMENT, 2010, 17 (02): : 91 - 109
  • [44] Random forest model for removal of bromophenol blue using activated carbon obtained from Astragalus bisulcatus tree
    Ghaedi, M.
    Ghaedi, A. M.
    Negintaji, E.
    Ansari, A.
    Vafaei, A.
    Rajabi, M.
    JOURNAL OF INDUSTRIAL AND ENGINEERING CHEMISTRY, 2014, 20 (04) : 1793 - 1803
  • [45] A Skeleton-Free Fall Detection System From Depth Images Using Random Decision Forest
    Abobakr, Ahmed
    Hossny, Mohammed
    Nahavandi, Saeid
    IEEE SYSTEMS JOURNAL, 2018, 12 (03): : 2994 - 3005
  • [46] Class-imbalanced dynamic financial distress prediction based on random forest from the perspective of concept drift
    Sun, Jie
    Zhao, Mengru
    Lei, Cong
    RISK MANAGEMENT-AN INTERNATIONAL JOURNAL, 2024, 26 (04):
  • [47] DETERMINATION AND VERIFICATION BY CALCULATION OF THERMODYNAMIC DATA FROM EXPERIMENTALLY OBTAINED VALUES .9. A MODIFIED 3 PARAMETER MARGULES STATEMENT
    GOLLES, F
    HOPFNER, A
    MONATSHEFTE FUR CHEMIE, 1968, 99 (01): : 230 - &
  • [48] Random forest model for removal of methylene blue and lead(II) ion using activated carbon obtained from Tamarisk
    Heydari, Farshad
    Ghaedi, Mehrorang
    Ansari, Amin
    Ghaedi, Abdol Mohammad
    DESALINATION AND WATER TREATMENT, 2016, 57 (41) : 19273 - 19291
  • [49] Identifying mangroves through knowledge extracted from trained random forest models: An interpretable mangrove mapping approach (IMMA)
    Zhao, Chuanpeng
    Jia, Mingming
    Wang, Zongming
    Mao, Dehua
    Wang, Yeqiao
    ISPRS JOURNAL OF PHOTOGRAMMETRY AND REMOTE SENSING, 2023, 201 : 209 - 225
  • [50] Identifying Fake News written on Albanian language in social media using Naive Bayes, SVM, Logistic Regression, Decision Tree and Random Forest algorithms
    Hoti, Arber H.
    Hoti, Mergim H.
    Hoti, Hamdi
    Salihu, Armend
    2022 11TH MEDITERRANEAN CONFERENCE ON EMBEDDED COMPUTING (MECO), 2022, : 259 - 264