MLOps FMEA: A Proactive & Structured Approach to Mitigate Failures and Ensure Success for Machine Learning Operations

被引:0
|
作者
Paul, Abhishek [1 ]
Son, Roderick Y. [1 ]
Balodi, Shiv A. [2 ]
Crooks, Kenney [3 ]
机构
[1] Northrop Grumman, Chief Data Off, 1 Space Pk Dr, Redondo Beach, CA 90034 USA
[2] Northrop Grumman, Chief Data Off, 2980 Fairview Pk Dr, Falls Church, VA 22042 USA
[3] Northrop Grumman, Reliabil & Model Based Sustainment, Aeronaut Sect, 2000 W NASA Blvd, Melbourne, FL 32901 USA
关键词
Machine Learning; Technology Readiness Levels; Natural Language Processing Failure Modes and Effects Analysis; Predictive Maintenance;
D O I
10.1109/RAMS51492.2024.10457600
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Machine learning applications have seen an exponential rise in prevalence across many different industries including healthcare, banking, manufacturing, and defense. While there is a lot of potential for machine learning applications, successful development and productionization is not assured. To prevent failures and ensure success, a Machine Learning Operations (MLOps) Failure Modes and Effects Analysis (FMEA) is proposed as a proactive structured approach for risk identification and mitigation. The MLOps FMEA framework demonstrates an approach to enumerate, prioritize, and mitigate potential failure modes, which spans the entire MLOps lifecycle. The MLOps FMEA framework tailors the classical FMEA to address the risk assessment needs for machine learning projects. This work proposes developing templated MLOps failure modes by utilizing the CRISP-ML(Q) as a standardized representation of the MLOps workflow to identify categories of MLOps failure modes, and the NIST Artificial Intelligence Risk Management Framework (AI RMF 1.0) as the basis for principled MLOps Design Patterns to derive specific failure modes. Together, these standards establish a methodological and comprehensive foundation to identify and establish templated failure modes in the MLOps lifecycle. This work also proposes adaptations to the classical FMEA workflow and risk prioritization to support the MLOps FMEA framework. For prioritizing MLOps failure modes, MLOps-centric Severity, Occurrence, & Detection tables were proposed, Consequence Levels (Safe vs. Unsafe) were incorporated, and risks are categorized by intentional and unintentional failure modes. As a machine learning project transitions from a proof of concept to a production solution, the MLOps FMEA framework is applied at each Machine Learning Technology Maturity Level (MLTRL). The MLOps FMEA framework is demonstrated with a predictive maintenance case study. This framework has aided the organization in increasing the successful delivery of impactful machine learning solutions to production, as well as providing the added benefit of increased machine learning awareness and maturity in the organizational culture.
引用
收藏
页数:7
相关论文
共 50 条
  • [11] Predicting the success of startups using a machine learning approach
    Mona Razaghzadeh Bidgoli
    Iman Raeesi Vanani
    Mehdi Goodarzi
    Journal of Innovation and Entrepreneurship, 13 (1)
  • [12] Forecasting Appliances Failures: A Machine-Learning Approach to Predictive Maintenance
    Fernandes, Sofia
    Antunes, Mario
    Santiago, Ana Rita
    Barraca, Joao Paulo
    Gomes, Diogo
    Aguiar, Rui L.
    INFORMATION, 2020, 11 (04)
  • [13] Detection of bearing failures using wavelet transformation and machine learning approach
    Golgowski, Maciej
    Osowski, Stanislaw
    2022 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2022,
  • [14] Predicting Cargo Train Failures: A Machine Learning Approach for a Lightweight Prototype
    Kauschke, Sebastian
    Fuernkranz, Johannes
    Janssen, Frederik
    DISCOVERY SCIENCE, (DS 2016), 2016, 9956 : 151 - 166
  • [15] A Machine Learning-driven Approach for Proactive Decision Making in Adaptive Architectures
    Muccini, Henry
    Vaidhyanathan, Karthik
    2019 IEEE INTERNATIONAL CONFERENCE ON SOFTWARE ARCHITECTURE COMPANION (ICSA-C 2019), 2019, : 242 - 245
  • [16] A machine learning approach to predict the success of crowdfunding fintech project
    Yeh, Jen-Yin
    Chen, Chi-Hua
    JOURNAL OF ENTERPRISE INFORMATION MANAGEMENT, 2022, 35 (06) : 1678 - 1696
  • [17] A Machine Learning Approach to Predict Crop Yield and Success Rate
    Kale, Shivani S.
    Patil, Preeti S.
    2019 IEEE PUNE SECTION INTERNATIONAL CONFERENCE (PUNECON), 2019,
  • [18] A Machine Learning Approach to Detect Early Signs of Startup Success
    Thirupathi, Abhinav Nadh
    Alhanai, Tuka
    Ghassemi, Mohammad M.
    ICAIF 2021: THE SECOND ACM INTERNATIONAL CONFERENCE ON AI IN FINANCE, 2021,
  • [19] A Machine Learning as a Service (MLaaS) Approach to Improve Marketing Success
    Pereira, Ivo
    Madureira, Ana
    Bettencourt, Nuno
    Coelho, Duarte
    Rebelo, Miguel Angelo
    Araujo, Carolina
    de Oliveira, Daniel Alves
    INFORMATICS-BASEL, 2024, 11 (02):
  • [20] Enhancing prediction of student success: Automated machine learning approach
    Zeineddine, Hassan
    Braendle, Udo
    Farah, Assaad
    Computers and Electrical Engineering, 2021, 89