Attention mechanism has recently gained immense importance in the natural language processing (NLP) world. This technique highlights parts of the input text that the NLP task (such as translation) must pay "attention " to. Inspired by this, some researchers have recently applied the NLP domain, deep -learning based, attention mechanism techniques to predictive maintenance. In contrast to the deep -learning based solutions, Industry 4.0 predictive maintenance solutions that often rely on edge -computing, demand lighter predictive models. With this objective, we have investigated the adaptation of a simpler, incredibly fast and compute -resource friendly, "NadarayaWatson estimator based " attention method. We develop a method to predict tool -wear of a milling machine using this attention mechanism and demonstrate, with the help of heat -maps, how the attention mechanism highlights regions that assist in predicting onset of tool -wear. We validate the effectiveness of this adaptation on the benchmark IEEEDataPort PHM Society dataset, by comparing against other comparatively "lighter " machine learning techniques - Bayesian Ridge, Gradient Boosting Regressor, SGD Regressor and Support Vector Regressor. Our experiments indicate that the proposed Nadaraya-Watson attention mechanism performed best with an MAE of 0.069, RMSE of 0.099 and R 2 of 83.40 %, when compared to the next best technique Gradient Boosting Regressor with figures of 0.100, 0.138, 66.51 % respectively. Additionally, it produced a lighter and faster model as well. center dot We propose a Nadaraya-Watson estimator based "attention mechanism ", applied to a predictive maintenance problem. center dot Unlike the deep -learning based attention mechanisms from the NLP domain, our method creates fast, light and high-performance models, suitable for edge computing devices and therefore supports the Industry 4.0 initiative. center dot Method validated on real tool -wear data of a milling machine.