MLOps FMEA: A Proactive & Structured Approach to Mitigate Failures and Ensure Success for Machine Learning Operations

被引:0
|
作者
Paul, Abhishek [1 ]
Son, Roderick Y. [1 ]
Balodi, Shiv A. [2 ]
Crooks, Kenney [3 ]
机构
[1] Northrop Grumman, Chief Data Off, 1 Space Pk Dr, Redondo Beach, CA 90034 USA
[2] Northrop Grumman, Chief Data Off, 2980 Fairview Pk Dr, Falls Church, VA 22042 USA
[3] Northrop Grumman, Reliabil & Model Based Sustainment, Aeronaut Sect, 2000 W NASA Blvd, Melbourne, FL 32901 USA
关键词
Machine Learning; Technology Readiness Levels; Natural Language Processing Failure Modes and Effects Analysis; Predictive Maintenance;
D O I
10.1109/RAMS51492.2024.10457600
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Machine learning applications have seen an exponential rise in prevalence across many different industries including healthcare, banking, manufacturing, and defense. While there is a lot of potential for machine learning applications, successful development and productionization is not assured. To prevent failures and ensure success, a Machine Learning Operations (MLOps) Failure Modes and Effects Analysis (FMEA) is proposed as a proactive structured approach for risk identification and mitigation. The MLOps FMEA framework demonstrates an approach to enumerate, prioritize, and mitigate potential failure modes, which spans the entire MLOps lifecycle. The MLOps FMEA framework tailors the classical FMEA to address the risk assessment needs for machine learning projects. This work proposes developing templated MLOps failure modes by utilizing the CRISP-ML(Q) as a standardized representation of the MLOps workflow to identify categories of MLOps failure modes, and the NIST Artificial Intelligence Risk Management Framework (AI RMF 1.0) as the basis for principled MLOps Design Patterns to derive specific failure modes. Together, these standards establish a methodological and comprehensive foundation to identify and establish templated failure modes in the MLOps lifecycle. This work also proposes adaptations to the classical FMEA workflow and risk prioritization to support the MLOps FMEA framework. For prioritizing MLOps failure modes, MLOps-centric Severity, Occurrence, & Detection tables were proposed, Consequence Levels (Safe vs. Unsafe) were incorporated, and risks are categorized by intentional and unintentional failure modes. As a machine learning project transitions from a proof of concept to a production solution, the MLOps FMEA framework is applied at each Machine Learning Technology Maturity Level (MLTRL). The MLOps FMEA framework is demonstrated with a predictive maintenance case study. This framework has aided the organization in increasing the successful delivery of impactful machine learning solutions to production, as well as providing the added benefit of increased machine learning awareness and maturity in the organizational culture.
引用
收藏
页数:7
相关论文
共 50 条
  • [31] A machine-learning approach for predicting success in smoking cessation intervention
    Davagdorj, Khishigsuren
    Lee, Jong Seol
    Park, Kwang Ho
    Ryu, Keun Ho
    2019 IEEE 10TH INTERNATIONAL CONFERENCE ON AWARENESS SCIENCE AND TECHNOLOGY (ICAST 2019), 2019, : 551 - 556
  • [32] Dynamics of Musical Success: A Machine Learning Approach for Multimedia Data Fusion
    Boughanmi, Khaled
    Ansari, Asim
    JOURNAL OF MARKETING RESEARCH, 2021, 58 (06) : 1034 - 1057
  • [33] Operational risk modelling and organizational learning in structured finance operations: a Bayesian network approach
    Sanford, Andrew
    Moosa, Imad
    JOURNAL OF THE OPERATIONAL RESEARCH SOCIETY, 2015, 66 (01) : 86 - 115
  • [34] The Discriminants of Long and Short Duration Failures in Fulfillment Sortation Equipment: A Machine Learning Approach
    Mutemi, Abed
    Bacao, Fernando
    JOURNAL OF ENGINEERING, 2023, 2023
  • [35] Root cause prediction for failures in semiconductor industry, a genetic algorithm–machine learning approach
    Abbas Rammal
    Kenneth Ezukwoke
    Anis Hoayek
    Mireille Batton-Hubert
    Scientific Reports, 13
  • [36] Feasibility Layer Aided Machine Learning Approach for Day-Ahead Operations
    Ramesh, Arun Venkatesh
    Li, Xingpeng
    IEEE TRANSACTIONS ON POWER SYSTEMS, 2024, 39 (01) : 1582 - 1593
  • [37] An approach for monitoring the execution of human based assembly operations using machine learning
    Andrianakos, George
    Dimitropoulos, Nikos
    Michalos, George
    Makris, Sotirios
    7TH CIRP GLOBAL WEB CONFERENCE - TOWARDS SHIFTED PRODUCTION VALUE STREAM PATTERNS THROUGH INFERENCE OF DATA, MODELS, AND TECHNOLOGY (CIRPE 2019), 2019, 86 : 198 - 203
  • [38] Predicting Protracted Concussion Recovery To Inform Proactive Care: A Genetic Fuzzy Machine Learning Approach
    Kiefer, Adam W.
    Sathyan, Anoop
    Reed, Christy
    Walker, Gregory
    Elpers, Jackson
    Gubanich, Paul
    Cohen, Kelly
    Logan, Kelsey
    MEDICINE & SCIENCE IN SPORTS & EXERCISE, 2020, 52 (07) : 785 - 785
  • [39] Detecting academic success/ failure patterns in higher education: A machine learning approach
    Musso, Mariel F.
    Cascallar, Eduardo C.
    INTERNATIONAL JOURNAL OF PSYCHOLOGY, 2023, 58 : 83 - 84
  • [40] Identifying the Determinants of Academic Success: A Machine Learning Approach in Spanish Higher Education
    Sanchez-Sanchez, Ana Maria
    Mello-Roman, Jorge Daniel
    Segura, Marina
    Hernandez, Adolfo
    SYSTEMS, 2024, 12 (10):