Process mining on noisy logs - Can log sanitization help to improve performance?

被引:47
|
作者
Cheng, Hsin-Jung [1 ]
Kumar, Akhil [2 ]
机构
[1] Natl Taiwan Univ Sci & Technol, Dept Ind Management, Taipei 106, Taiwan
[2] Penn State Univ, Smeal Coll Business, Dept Supply Chain & Informat Syst, University Pk, PA 16802 USA
关键词
Process mining; Benchmarking; Noisy data; Log sanitization; Metrics; Rules; PROCESS MODELS; FRAMEWORK; PRISM;
D O I
10.1016/j.dss.2015.08.003
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Process mining techniques are designed to read process logs and extract process models from them. However, real world logs are often noisy and such logs produce bad, spaghetti-like process models. We propose a technique to sanitize noisy logs by first building a classifier on a subset of the log, and applying the classifier rules to remove noisy traces from the log. The improvement in the quality of the resulting process models is evaluated on synthetic logs from benchmark models of increasing complexity on both behavioral and structural recall and precision metrics. The results show that mined models produced from such preprocessed logs are superior on several evaluation metrics. They show better fidelity to the reference models, and are also more compact with fewer elements. A nice feature of the rule based approach is that it generalizes to any noise pattern since the nature of noise varies from one log to another. The rules can also be explained and may be further modified manually. We also give results from experiments with a real dataset. (C) 2015 Elsevier B.V. All rights reserved.
引用
收藏
页码:138 / 149
页数:12
相关论文
共 50 条
  • [1] Optimal event log sanitization for privacy-preserving process mining
    Fahrenkrog-Petersen, Stephan A.
    van der Aa, Han
    Weidlich, Matthias
    [J]. DATA & KNOWLEDGE ENGINEERING, 2023, 145
  • [2] Bot Log Mining: Using Logs from Robotic Process Automation for Process Mining
    Egger, Andreas
    ter Hofstede, Arthur H. M.
    Kratsch, Wolfgang
    Leemans, Sander J. J.
    Roeglinger, Maximilian
    Wynn, Moe Thandar
    [J]. CONCEPTUAL MODELING, ER 2020, 2020, 12400 : 51 - 61
  • [3] Mining Process Performance from Event Logs
    Adriansyah, Arya
    Buijs, Joos C. A. M.
    [J]. BUSINESS PROCESS MANAGEMENT WORKSHOPS (BPM), 2013, 132 : 217 - 218
  • [4] Log mining to improve the performance of site search
    Xue, GR
    Zeng, HJ
    Chen, Z
    Ma, WY
    Lu, CJ
    [J]. WISE 2002: PROCEEDINGS OF THE THIRD INTERNATIONAL CONFERENCE ON WEB INFORMATION SYSTEMS ENGINEERING (WORKSHOPS), 2002, : 238 - 245
  • [5] Enhancement in Process Mining Model by Repairing Noisy Behavior in Event Log
    Shahzadi, Shabnam
    Emam, Walid
    Shahzad, Usman
    Iftikhar, Soofia
    Ahmad, Ishfaq
    Sharma, Gaurav
    [J]. IEEE ACCESS, 2024, 12 : 82938 - 82948
  • [6] Can imperfections help to improve bioreactor performance?
    Patnaik, PR
    [J]. TRENDS IN BIOTECHNOLOGY, 2002, 20 (04) : 135 - 137
  • [7] Repairing Event Logs to Enhance the Performance of a Process Mining Model
    Shahzadi, Shabnam
    Fang, Xianwen
    Shahzad, Usman
    Ahmad, Ishfaq
    Benedict, Troon
    [J]. MATHEMATICAL PROBLEMS IN ENGINEERING, 2022, 2022
  • [8] Event log imperfection patterns for process mining: Towards a systematic approach to cleaning event logs
    Suriadi, S.
    Andrews, R.
    ter Hofstede, A. H. M.
    Wynn, M. T.
    [J]. INFORMATION SYSTEMS, 2017, 64 : 132 - 150
  • [9] Can measurement of results help improve the performance of schools?
    Anderson, B
    MacDonald, DS
    Sinnemann, C
    [J]. PHI DELTA KAPPAN, 2004, 85 (10) : 735 - 739
  • [10] Mining Web Logs with PLSA Based Prediction Model to Improve Web Caching Performance
    Huang, Chuibi
    Wang, Jinlin
    Deng, Haojiang
    Chen, Jun
    [J]. JOURNAL OF COMPUTERS, 2013, 8 (05) : 1351 - 1356