The renewable energy platform cannot operate without an accurate wind power forecast. The power system can better manage its supply and guarantee grid reliability with an accurate wind power forecast. Accurate forecasting is difficult to achieve, though, because wind power generation is inherently unpredictable and intermittent. Predicting wind time series is challenging due to its nonlinearity, intermittent nature, and variability. Deep learning (DL) methods can be useful in situations when the data lacks a defined structure. With some accuracy and consistency, these methods are able to forecast wind power. To achieve this, a novel modified Siamese transformer-network (MST-Net) model with a multi-attention mechanism that enhances their ability to pay more attention to various input sequences, better capturing long-term dependency for wind forecasting. Optimizing system efficiency is also crucial thus fine-tuning of PID controller parameters using a self-adaptive mountain gazelle optimizer (SA-MGO) is employed to achieve minimal Mean Squared Error (MSE) and THD. The proposed model efficiency has been confirmed by contrasting its prediction outcomes with those of other deep learning models, including Siamese Network, LSTM, and Transformer. The proposed system is simulated in MATLAB. The proposed method is tested by comparing its efficacy with other accepted approaches using error measures such as Mean Absolute Error (MAE), Root Mean Squared Error (RMSE), Mean Squared Error (MSE), Mean Absolute Percentage Error (MAPE), Mean Square Root Error (MSRE), and Absolute Error (AE). The model's performance is tested using real-time datasets with a cap value of 1500 kW. MST-Net closely tracks actual power trends, while other models either underpredict or smooth the data excessively.