Interpretable machine learning models for failure cause prediction in imbalanced oil pipeline data

被引:2
|
作者
Awuku, Bright [1 ]
Huang, Ying [1 ]
Yodo, Nita [1 ]
Asa, Eric [1 ]
机构
[1] North Dakota State Univ, Dept Civil Construct & Environm Engn, Fargo, ND 58102 USA
基金
美国国家科学基金会;
关键词
energy; oil; machine learning; deep learning; interpretability; pipeline; failure;
D O I
10.1088/1361-6501/ad3570
中图分类号
T [工业技术];
学科分类号
08 ;
摘要
Pipelines are critical arteries in the oil and gas industry and require massive capital investment to safely construct networks that transport hydrocarbons across diverse environments. However, these pipeline systems are prone to integrity failure, which results in significant economic losses and environmental damage. Accurate prediction of pipeline failure events using historical oil pipeline accident data enables asset managers to plan sufficient maintenance, rehabilitation, and repair activities to prevent catastrophic failures. However, learning the complex interdependencies between pipeline attributes and rare failure events presents several analytical challenges. This study proposes a novel machine learning (ML) framework to accurately predict pipeline failure causes on highly class-imbalanced data compiled by the United States Pipeline and Hazardous Materials Safety Administration. Natural language processing techniques were leveraged to extract informative features from unstructured text data. Furthermore, class imbalance in the dataset was addressed via oversampling and intrinsic cost-sensitive learning (CSL) strategies adapted for the multi-class case. Nine machine and deep learning architectures were benchmarked, with LightGBM demonstrating superior performance. The integration of CSL yielded an 86% F1 score and a 0.82 Cohen kappa score, significantly advancing prior research. This study leveraged a comprehensive Shapley Additive explanation analysis to interpret the predictions from the LightGBM algorithm, revealing the key factors driving failure probabilities. Leveraging sentiment analysis allowed the models to capture a richer, more multifaceted representation of the textual data. This study developed a novel CSL approach that integrates domain knowledge regarding the varying cost impacts of misclassifying different failure types into ML models. This research demonstrated an effective fusion of text insights from inspection reports with structured pipeline data that enhances model interpretability. The resulting AI modeling framework generated data-driven predictions of the causes of failure that could enable transportation agencies with actionable insights. These insights enable tailored preventative maintenance decisions to proactively mitigate emerging pipeline failures.
引用
收藏
页数:18
相关论文
共 50 条
  • [21] Reliable prediction of software defects using Shapley interpretable machine learning models
    Al-Smadi, Yazan
    Eshtay, Mohammed
    Al-Qerem, Ahmad
    Nashwan, Shadi
    Ouda, Osama
    Abd El-Aziz, A. A.
    EGYPTIAN INFORMATICS JOURNAL, 2023, 24 (03)
  • [22] Interpretable machine learning models for prolonged Emergency Department wait time prediction
    Wang, Hao
    Sambamoorthi, Nethra
    Sandlin, Devin
    Sambamoorthi, Usha
    BMC HEALTH SERVICES RESEARCH, 2025, 25 (01)
  • [23] Interpretable prediction of 3-year all-cause mortality in patients with chronic heart failure based on machine learning
    Xu, Chenggong
    Li, Hongxia
    Yang, Jianping
    Peng, Yunzhu
    Cai, Hongyan
    Zhou, Jing
    Gu, Wenyi
    Chen, Lixing
    BMC MEDICAL INFORMATICS AND DECISION MAKING, 2023, 23 (01)
  • [24] Machine learning for mining imbalanced data
    Arafat, Md. Yasir
    Hoque, Sabera
    Xu, Shuxiang
    Farid, Dewan Md
    IAENG International Journal of Computer Science, 2019, 46 (02) : 332 - 348
  • [25] Interpretable prediction of 3-year all-cause mortality in patients with chronic heart failure based on machine learning
    Chenggong Xu
    Hongxia Li
    Jianping Yang
    Yunzhu Peng
    Hongyan Cai
    Jing Zhou
    Wenyi Gu
    Lixing Chen
    BMC Medical Informatics and Decision Making, 23
  • [26] A Hybrid Machine Learning Approach for Improving Mortality Risk Prediction on Imbalanced Data
    Tashkandi, Araek
    Wiese, Lena
    IIWAS2019: THE 21ST INTERNATIONAL CONFERENCE ON INFORMATION INTEGRATION AND WEB-BASED APPLICATIONS & SERVICES, 2019, : 83 - 92
  • [27] Handling Imbalanced Data in Road Crash Severity Prediction by Machine Learning Algorithms
    Fiorentini, Nicholas
    Losa, Massimo
    INFRASTRUCTURES, 2020, 5 (07)
  • [28] Personalizing the Prediction: Interactive and Interpretable machine learning
    Koh, Seunghun
    Wi, Hee Ju
    Kim, Byung Hyung
    Jo, Sungho
    2019 16TH INTERNATIONAL CONFERENCE ON UBIQUITOUS ROBOTS (UR), 2019, : 354 - 359
  • [29] Characteristics prediction of hydrothermal biochar using data enhanced interpretable machine learning
    Chen, Chao
    Wang, Zhi
    Ge, Yadong
    Liang, Rui
    Hou, Donghao
    Tao, Junyu
    Yan, Beibei
    Zheng, Wandong
    Velichkova, Rositsa
    Chen, Guanyi
    BIORESOURCE TECHNOLOGY, 2023, 377
  • [30] An Efficient Machine Learning Method to Solve Imbalanced Data in Metabolic Disease Prediction
    Cecchini, Vania
    Nguyen, Thanh-Phuong
    Pfau, Thomas
    De landtsheer, Sebastien
    Sauter, Thomas
    PROCEEDINGS OF 2019 11TH INTERNATIONAL CONFERENCE ON KNOWLEDGE AND SYSTEMS ENGINEERING (KSE 2019), 2019, : 357 - 361