Interpretable machine learning models for failure cause prediction in imbalanced oil pipeline data

被引：2

作者：

Awuku, Bright ^{[1
]}

Huang, Ying ^{[1
]}

Yodo, Nita ^{[1
]}

Asa, Eric ^{[1
]}

机构：

[1] North Dakota State Univ, Dept Civil Construct & Environm Engn, Fargo, ND 58102 USA

来源：

MEASUREMENT SCIENCE AND TECHNOLOGY | 2024年 / 35卷 / 07期

基金：

美国国家科学基金会;

关键词：

energy; oil; machine learning; deep learning; interpretability; pipeline; failure;

D O I：

10.1088/1361-6501/ad3570

中图分类号：

T [工业技术];

学科分类号：

08 ;

摘要：

Pipelines are critical arteries in the oil and gas industry and require massive capital investment to safely construct networks that transport hydrocarbons across diverse environments. However, these pipeline systems are prone to integrity failure, which results in significant economic losses and environmental damage. Accurate prediction of pipeline failure events using historical oil pipeline accident data enables asset managers to plan sufficient maintenance, rehabilitation, and repair activities to prevent catastrophic failures. However, learning the complex interdependencies between pipeline attributes and rare failure events presents several analytical challenges. This study proposes a novel machine learning (ML) framework to accurately predict pipeline failure causes on highly class-imbalanced data compiled by the United States Pipeline and Hazardous Materials Safety Administration. Natural language processing techniques were leveraged to extract informative features from unstructured text data. Furthermore, class imbalance in the dataset was addressed via oversampling and intrinsic cost-sensitive learning (CSL) strategies adapted for the multi-class case. Nine machine and deep learning architectures were benchmarked, with LightGBM demonstrating superior performance. The integration of CSL yielded an 86% F1 score and a 0.82 Cohen kappa score, significantly advancing prior research. This study leveraged a comprehensive Shapley Additive explanation analysis to interpret the predictions from the LightGBM algorithm, revealing the key factors driving failure probabilities. Leveraging sentiment analysis allowed the models to capture a richer, more multifaceted representation of the textual data. This study developed a novel CSL approach that integrates domain knowledge regarding the varying cost impacts of misclassifying different failure types into ML models. This research demonstrated an effective fusion of text insights from inspection reports with structured pipeline data that enhances model interpretability. The resulting AI modeling framework generated data-driven predictions of the causes of failure that could enable transportation agencies with actionable insights. These insights enable tailored preventative maintenance decisions to proactively mitigate emerging pipeline failures.

引用

页数：18

共 50 条

[1] An Explainable Machine Learning Pipeline for Stroke Prediction on Imbalanced Data
Kokkotis, Christos
Giarmatzis, Georgios
Giannakou, Erasmia
Moustakidis, Serafeim
Tsatalas, Themistoklis
Tsiptsios, Dimitrios
Vadikolias, Konstantinos
Aggelousis, Nikolaos
DIAGNOSTICS, 2022, 12 (10)
[2] Interpretable machine learning models for crime prediction
Zhang, Xu
Liu, Lin
Lan, Minxuan
Song, Guangwen
Xiao, Luzi
Chen, Jianguo
COMPUTERS ENVIRONMENT AND URBAN SYSTEMS, 2022, 94
[3] Advancing preeclampsia prediction: a tailored machine learning pipeline integrating resampling and ensemble models for handling imbalanced medical data
Yinyao Ma
Hanlin Lv
Yanhua Ma
Xiao Wang
Longting Lv
Xuxia Liang
Lei Wang
BioData Mining, 18 (1)
[4] Machine Learning and Synthetic Minority Oversampling Techniques for Imbalanced Data: Improving Machine Failure Prediction
Wah, Yap Bee
Ismail, Azlan
Azid, Nur Niswah Naslina
Jaafar, Jafreezal
Aziz, Izzatdin Abdul
Hasan, Mohd Hilmi
Zain, Jasni Mohamad
CMC-COMPUTERS MATERIALS & CONTINUA, 2023, 75 (03): : 4821 - 4841
[5] Interpretable machine learning prediction of all-cause mortality
Qiu, Wei
Chen, Hugh
Dincer, Ayse Berceste
Lundberg, Scott
Kaeberlein, Matt
Lee, Su-In
COMMUNICATIONS MEDICINE, 2022, 2 (01):
[6] Interpretable machine learning prediction of all-cause mortality
Wei Qiu
Hugh Chen
Ayse Berceste Dincer
Scott Lundberg
Matt Kaeberlein
Su-In Lee
Communications Medicine, 2
[7] Dealing with imbalanced data for interpretable defect prediction
Gao, Yuxiang
Zhu, Yi
Zhao, Yu
INFORMATION AND SOFTWARE TECHNOLOGY, 2022, 151
[8] Interpretable machine learning models for the prediction of all-cause mortality and time to death in hemodialysis patients
Chen, Minjie
Zeng, Youbing
Liu, Mengting
Li, Zhenghui
Wu, Jiazhen
Tian, Xuan
Wang, Yunuo
Xu, Yuanwen
THERAPEUTIC APHERESIS AND DIALYSIS, 2024,
[9] Interpretable machine learning models for concrete compressive strength prediction
Hoang, Huong-Giang Thi
Nguyen, Thuy-Anh
Ly, Hai-Bang
INNOVATIVE INFRASTRUCTURE SOLUTIONS, 2025, 10 (01)
[10] Interpretable Machine Learning Models for Prediction of UHPC Creep Behavior
Zhu, Peng
Cao, Wenshuo
Zhang, Lianzhen
Zhou, Yongjun
Wu, Yuching
Ma, Zhongguo John
BUILDINGS, 2024, 14 (07)

← 1 2 3 4 5 →