An empirical study of the naïve REINFORCE algorithm for predictive maintenance

被引：0

作者：

Siraskar, Rajesh ^{[1
,5
]}

Kumar, Satish ^{[1
,2
]}

Patil, Shruti ^{[1
,2
]}

Bongale, Arunkumar ^{[1
]}

Kotecha, Ketan ^{[1
,2
,4
]}

Kulkarni, Ambarish ^{[3
]}

机构：

[1] Symbiosis Int Deemed Univ, Symbiosis Inst Technol, Pune Campus, Pune, India

[2] Symbiosis Int Deemed Univ, Symbiosis Ctr Appl Artificial Intelligence, Pune, India

[3] Swinburne Univ Technol, Hawthorn 3122, Australia

[4] RUDN Univ, People Friendship Univ Russia, Miklukho Maklaya Str 6, Moscow 117198, Russia

[5] Birlasoft Ltd, CTO Off, Pune 411057, India

来源：

DISCOVER APPLIED SCIENCES | 2025年 / 7卷 / 03期

关键词：

Reinforcement learning; Predictive maintenance; REINFORCE;

D O I：

10.1007/s42452-025-06613-1

中图分类号：

O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];

学科分类号：

07 ; 0710 ; 09 ;

摘要：

Reinforcement Learning (RL) is a biologically inspired, autonomous machine learning method. RL algorithms can help generate optimal predictive maintenance (PdM) policies for complex industrial systems. However, these algorithms are extremely sensitive to hyperparameter tuning and network architecture, and this is where automated RL frameworks (AutoRL) can offer a platform to encourage industrial practitioners to apply RL to their problems. AutoRL applied to PdM has yet to be studied. Aimed at practitioners unfamiliar with complex RL tuning, we undertake an empirical study to understand untuned RL algorithms for generating optimal tool replacement policies for milling machines. We compare a na & iuml;ve implementation of REINFORCE against the policies of industry-grade implementations of three advanced algorithms - Deep Q-Network (DQN), Advantage Actor-Critic (A2C), and Proximal Policy Optimization (PPO). Our broad goal was to study model performance under four scenarios: (1) simulated tool-wear data, (2) actual tool-wear data (benchmark IEEEDataPort PHM Society datasets), (3) univariate state with added noise levels and a random chance of break-down, and finally (4) complex multivariate state. Across 15 environment variants, REINFORCE models demonstrated higher tool replacement precision 0.687, recall 0.629 and F1 0.609 against A2C (0.449/0.480/0.442), DQN (0.418/0.504/0.374) and PPO (0.472/0.316/0.345), while demonstrating lower variability. Comparing the best auto-selected model, over ten training rounds produced unusually wider performance gaps with the REINFORCE precision, recall and F1 at 0.884, 0.884, 0.873 against the best A2C (0.520/0.859/0.639), DQN (0.651/0.937/0.740), and PPO (0.558/0.643/0.580) models. For the REINFORCE, a basic hyperparameter sensitivity and interaction analysis is conducted to better understand the dynamics and present results for the hyperparameters learning rate, discount factor gamma\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\gamma$$\end{document} and the network activation functions (ReLU and Tanh). Our study suggests that, in the untuned state, simpler algorithms like the REINFORCE perform reasonably well. For AutoRL frameworks, this research encourages seeking new design approaches to automatically identify optimum algorithm-hyperparameter combinations.

引用

页数：37

共 50 条

[21] Digital Predictive Maintenance: Case Study
Benesova, Andrea
Hirman, Martin
Steiner, Frantisek
Tupa, Jiri
2024 INTERNATIONAL CONFERENCE ON DIAGNOSTICS IN ELECTRICAL ENGINEERING, DIAGNOSTIKA 2024, 2024, : 168 - 173
[22] An empirical study of distributed software maintenance
Bianchi, A
Caivano, D
Lanubile, F
Rago, F
Visaggio, G
INTERNATIONAL CONFERENCE ON SOFTWARE MAINTENANCE, PROCEEDINGS, 2002, : 103 - 109
[23] A Semantic Model in the Context of Maintenance: A Predictive Maintenance Case Study
May, Gokan
Cho, Sangje
Majidirad, AmirHossein
Kiritsis, Dimitris
APPLIED SCIENCES-BASEL, 2022, 12 (12):
[24] Belief elicitation in the presence of naïve respondents: An experimental study
Li Hao
Daniel Houser
Journal of Risk and Uncertainty, 2012, 44 : 161 - 180
[25] Applicability of Algorithm Evaluation Metrics for Predictive Maintenance in Production Systems
Engbers, Hendrik
Alla, Abderrahim Ait
Kreutz, Markus
Freitag, Michael
2020 6TH IEEE CONGRESS ON INFORMATION SCIENCE AND TECHNOLOGY (IEEE CIST'20), 2020, : 413 - 418
[26] Selecting an appropriate supervised machine learning algorithm for predictive maintenance
Abdelfettah Ouadah
Leila Zemmouchi-Ghomari
Nedjma Salhi
The International Journal of Advanced Manufacturing Technology, 2022, 119 : 4277 - 4301
[27] A conceptual framework for machine learning algorithm selection for predictive maintenance
Arena, Simone
Florian, Eleonora
Sgarbossa, Fabio
Solvsberg, Endre
Zennaro, Ilenia
ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2024, 133
[28] Selecting an appropriate supervised machine learning algorithm for predictive maintenance
Ouadah, Abdelfettah
Zemmouchi-Ghomari, Leila
Salhi, Nedjma
INTERNATIONAL JOURNAL OF ADVANCED MANUFACTURING TECHNOLOGY, 2022, 119 (7-8): : 4277 - 4301
[29] Predictive Maintenance Algorithm Based on Machine Learning for Industrial Asset
Alfaro-Nango, Angel J.
Escobar-Gomez, Elias N.
Chandomi-Castellanos, Eduardo
Velazquez-Trujillo, Sabino
Hernandez-de-Leon, Hector R.
Blanco-Gonzalez, Lidya M.
2022 8TH INTERNATIONAL CONFERENCE ON CONTROL, DECISION AND INFORMATION TECHNOLOGIES (CODIT'22), 2022, : 1489 - 1494
[30] NBA-Palm: prediction of palmitoylation site implemented in Naïve Bayes algorithm
Yu Xue
Hu Chen
Changjiang Jin
Zhirong Sun
Xuebiao Yao
BMC Bioinformatics, 7

← 1 2 3 4 5 →