An empirical study of the naïve REINFORCE algorithm for predictive maintenance

被引：0

作者：

Siraskar, Rajesh ^{[1
,5
]}

Kumar, Satish ^{[1
,2
]}

Patil, Shruti ^{[1
,2
]}

Bongale, Arunkumar ^{[1
]}

Kotecha, Ketan ^{[1
,2
,4
]}

Kulkarni, Ambarish ^{[3
]}

机构：

[1] Symbiosis Int Deemed Univ, Symbiosis Inst Technol, Pune Campus, Pune, India

[2] Symbiosis Int Deemed Univ, Symbiosis Ctr Appl Artificial Intelligence, Pune, India

[3] Swinburne Univ Technol, Hawthorn 3122, Australia

[4] RUDN Univ, People Friendship Univ Russia, Miklukho Maklaya Str 6, Moscow 117198, Russia

[5] Birlasoft Ltd, CTO Off, Pune 411057, India

来源：

DISCOVER APPLIED SCIENCES | 2025年 / 7卷 / 03期

关键词：

Reinforcement learning; Predictive maintenance; REINFORCE;

D O I：

10.1007/s42452-025-06613-1

中图分类号：

O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];

学科分类号：

07 ; 0710 ; 09 ;

摘要：

Reinforcement Learning (RL) is a biologically inspired, autonomous machine learning method. RL algorithms can help generate optimal predictive maintenance (PdM) policies for complex industrial systems. However, these algorithms are extremely sensitive to hyperparameter tuning and network architecture, and this is where automated RL frameworks (AutoRL) can offer a platform to encourage industrial practitioners to apply RL to their problems. AutoRL applied to PdM has yet to be studied. Aimed at practitioners unfamiliar with complex RL tuning, we undertake an empirical study to understand untuned RL algorithms for generating optimal tool replacement policies for milling machines. We compare a na & iuml;ve implementation of REINFORCE against the policies of industry-grade implementations of three advanced algorithms - Deep Q-Network (DQN), Advantage Actor-Critic (A2C), and Proximal Policy Optimization (PPO). Our broad goal was to study model performance under four scenarios: (1) simulated tool-wear data, (2) actual tool-wear data (benchmark IEEEDataPort PHM Society datasets), (3) univariate state with added noise levels and a random chance of break-down, and finally (4) complex multivariate state. Across 15 environment variants, REINFORCE models demonstrated higher tool replacement precision 0.687, recall 0.629 and F1 0.609 against A2C (0.449/0.480/0.442), DQN (0.418/0.504/0.374) and PPO (0.472/0.316/0.345), while demonstrating lower variability. Comparing the best auto-selected model, over ten training rounds produced unusually wider performance gaps with the REINFORCE precision, recall and F1 at 0.884, 0.884, 0.873 against the best A2C (0.520/0.859/0.639), DQN (0.651/0.937/0.740), and PPO (0.558/0.643/0.580) models. For the REINFORCE, a basic hyperparameter sensitivity and interaction analysis is conducted to better understand the dynamics and present results for the hyperparameters learning rate, discount factor gamma\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\gamma$$\end{document} and the network activation functions (ReLU and Tanh). Our study suggests that, in the untuned state, simpler algorithms like the REINFORCE perform reasonably well. For AutoRL frameworks, this research encourages seeking new design approaches to automatically identify optimum algorithm-hyperparameter combinations.

引用

页数：37

共 50 条

[1] Engine gearbox fault diagnosis using empirical mode decomposition method and Naïve Bayes algorithm
Kiran Vernekar
Hemantha Kumar
K V Gangadharan
Sādhanā, 2017, 42 : 1143 - 1153
[2] Building an algorithm for predictive maintenance
Abiad, Mohammad
Ionescu, Sorin
UPB Scientific Bulletin, Series D: Mechanical Engineering, 2020, 82 (04): : 337 - 348
[3] A naïve five-element string algorithm
Cui, Yanhong
Guo, Renkuan
Guo, Danni
Journal of Software, 2009, 4 (09) : 925 - 934
[4] Na?ve Bayes Algorithm for Large Scale Text Classification
Pirunthavi SIVAKUMAR
Jayalath EKANAYAKE
Instrumentation, 2021, 8 (04) : 55 - 62
[5] A naïve HMO study of the casimir effect
Ramon Carbó-Dorca
Journal of Mathematical Chemistry, 2022, 60 : 581 - 585
[6] Predictive Maintenance for a Ventilator Using LSTM Algorithm
Ruhiyat, Yusuf Hamzah
Sumaryo, Sony
Susanto, Erwin
2022 IEEE ASIA PACIFIC CONFERENCE ON WIRELESS AND MOBILE (APWIMOB), 2022, : 108 - 111
[7] A general prognostic tracking algorithm for predictive maintenance
Swanson, DC
2001 IEEE AEROSPACE CONFERENCE PROCEEDINGS, VOLS 1-7, 2001, : 2971 - 2977
[8] Adaboost algorithm in the frame of predictive maintenance tasks
Vasilic, Predrag
Vujnovic, Sanja
Popovic, Nikola
Marjanovic, Aleksandra
Durovic, Zeljko
2018 23RD INTERNATIONAL SCIENTIFIC-PROFESSIONAL CONFERENCE ON INFORMATION TECHNOLOGY (IT), 2018,
[9] Study on predictive maintenance strategy
1600, Science and Engineering Research Support Society (09):
[10] Spam message classification based on the naïve Bayes classification algorithm
Ning, Bin
Junwei, Wu
Feng, Hu
IAENG International Journal of Computer Science, 2019, 46 (01)

← 1 2 3 4 5 →