Interpretable machine learning models for failure cause prediction in imbalanced oil pipeline data

被引:2
|
作者
Awuku, Bright [1 ]
Huang, Ying [1 ]
Yodo, Nita [1 ]
Asa, Eric [1 ]
机构
[1] North Dakota State Univ, Dept Civil Construct & Environm Engn, Fargo, ND 58102 USA
基金
美国国家科学基金会;
关键词
energy; oil; machine learning; deep learning; interpretability; pipeline; failure;
D O I
10.1088/1361-6501/ad3570
中图分类号
T [工业技术];
学科分类号
08 ;
摘要
Pipelines are critical arteries in the oil and gas industry and require massive capital investment to safely construct networks that transport hydrocarbons across diverse environments. However, these pipeline systems are prone to integrity failure, which results in significant economic losses and environmental damage. Accurate prediction of pipeline failure events using historical oil pipeline accident data enables asset managers to plan sufficient maintenance, rehabilitation, and repair activities to prevent catastrophic failures. However, learning the complex interdependencies between pipeline attributes and rare failure events presents several analytical challenges. This study proposes a novel machine learning (ML) framework to accurately predict pipeline failure causes on highly class-imbalanced data compiled by the United States Pipeline and Hazardous Materials Safety Administration. Natural language processing techniques were leveraged to extract informative features from unstructured text data. Furthermore, class imbalance in the dataset was addressed via oversampling and intrinsic cost-sensitive learning (CSL) strategies adapted for the multi-class case. Nine machine and deep learning architectures were benchmarked, with LightGBM demonstrating superior performance. The integration of CSL yielded an 86% F1 score and a 0.82 Cohen kappa score, significantly advancing prior research. This study leveraged a comprehensive Shapley Additive explanation analysis to interpret the predictions from the LightGBM algorithm, revealing the key factors driving failure probabilities. Leveraging sentiment analysis allowed the models to capture a richer, more multifaceted representation of the textual data. This study developed a novel CSL approach that integrates domain knowledge regarding the varying cost impacts of misclassifying different failure types into ML models. This research demonstrated an effective fusion of text insights from inspection reports with structured pipeline data that enhances model interpretability. The resulting AI modeling framework generated data-driven predictions of the causes of failure that could enable transportation agencies with actionable insights. These insights enable tailored preventative maintenance decisions to proactively mitigate emerging pipeline failures.
引用
收藏
页数:18
相关论文
共 50 条
  • [1] An Explainable Machine Learning Pipeline for Stroke Prediction on Imbalanced Data
    Kokkotis, Christos
    Giarmatzis, Georgios
    Giannakou, Erasmia
    Moustakidis, Serafeim
    Tsatalas, Themistoklis
    Tsiptsios, Dimitrios
    Vadikolias, Konstantinos
    Aggelousis, Nikolaos
    DIAGNOSTICS, 2022, 12 (10)
  • [2] Interpretable machine learning models for crime prediction
    Zhang, Xu
    Liu, Lin
    Lan, Minxuan
    Song, Guangwen
    Xiao, Luzi
    Chen, Jianguo
    COMPUTERS ENVIRONMENT AND URBAN SYSTEMS, 2022, 94
  • [3] Advancing preeclampsia prediction: a tailored machine learning pipeline integrating resampling and ensemble models for handling imbalanced medical data
    Yinyao Ma
    Hanlin Lv
    Yanhua Ma
    Xiao Wang
    Longting Lv
    Xuxia Liang
    Lei Wang
    BioData Mining, 18 (1)
  • [4] Machine Learning and Synthetic Minority Oversampling Techniques for Imbalanced Data: Improving Machine Failure Prediction
    Wah, Yap Bee
    Ismail, Azlan
    Azid, Nur Niswah Naslina
    Jaafar, Jafreezal
    Aziz, Izzatdin Abdul
    Hasan, Mohd Hilmi
    Zain, Jasni Mohamad
    CMC-COMPUTERS MATERIALS & CONTINUA, 2023, 75 (03): : 4821 - 4841
  • [5] Interpretable machine learning prediction of all-cause mortality
    Qiu, Wei
    Chen, Hugh
    Dincer, Ayse Berceste
    Lundberg, Scott
    Kaeberlein, Matt
    Lee, Su-In
    COMMUNICATIONS MEDICINE, 2022, 2 (01):
  • [6] Interpretable machine learning prediction of all-cause mortality
    Wei Qiu
    Hugh Chen
    Ayse Berceste Dincer
    Scott Lundberg
    Matt Kaeberlein
    Su-In Lee
    Communications Medicine, 2
  • [7] Dealing with imbalanced data for interpretable defect prediction
    Gao, Yuxiang
    Zhu, Yi
    Zhao, Yu
    INFORMATION AND SOFTWARE TECHNOLOGY, 2022, 151
  • [8] Interpretable machine learning models for the prediction of all-cause mortality and time to death in hemodialysis patients
    Chen, Minjie
    Zeng, Youbing
    Liu, Mengting
    Li, Zhenghui
    Wu, Jiazhen
    Tian, Xuan
    Wang, Yunuo
    Xu, Yuanwen
    THERAPEUTIC APHERESIS AND DIALYSIS, 2024,
  • [9] Interpretable machine learning models for concrete compressive strength prediction
    Hoang, Huong-Giang Thi
    Nguyen, Thuy-Anh
    Ly, Hai-Bang
    INNOVATIVE INFRASTRUCTURE SOLUTIONS, 2025, 10 (01)
  • [10] Interpretable Machine Learning Models for Prediction of UHPC Creep Behavior
    Zhu, Peng
    Cao, Wenshuo
    Zhang, Lianzhen
    Zhou, Yongjun
    Wu, Yuching
    Ma, Zhongguo John
    BUILDINGS, 2024, 14 (07)