Satellite-derived aerosol data from Modern-Era Retrospective Analysis for Research and Application, version 2 (MERRA-2), have received a lot of attention and offer substantial information for estimating aerosols, as well as for air quality, climate, and health assessments.This study proposes an integration of multiple measurement approaches, including empirical modeling of MERRA-2 aerosol components, namely black carbon (BC), organic carbon (OC), dust, sea-salt, and sulfate (SO42-), ground-based PM2.5, and the feasibility of machine learning to simulate PM2.5 concentrations and develop accurate and robust models in two distinct climatic regions in India-Rajasthan (arid/semi-arid) and Kerala (humid/semi-humid). The 42-year MERRA-2 aerosol datasets (1980-2022) were obtained at the daily and monthly resolution, while hourly ground-based PM2.5 datasets were collected from fourteen (14) continuous ambient air quality monitoring stations (CAAQM(S)) in Rajasthan and Kerala. Firstly, we examined long-term trends of empirical-based PM2.5 [MERRA2-EE-PM2.5] from 1980 to 2022, and results indicated a significant rise in PM2.5 levels in both Rajasthan and Kerala post-2000, largely attributed to commercialization and industrialization. A detailed correlation analysis of MERRA-2 aerosol components with PM2.5 highlighted that dust, BC, OC, and SO42- were the main contributors in the arid and semi-arid regions of Rajasthan. In contrast, Kerala's humid regions showed differences among these components, highlighting the complexity of regional aerosol impacts. Further, MERRA2-EE-PM2.5 underestimates, implying the need for accurate weighting of each aerosol component and advanced models with meteorological parameters. Therefore, we ensembled three machine learning (ML) models: random forest (RF), gradient boosting (GB), and k-nearest neighbour (k-NN) to estimate modelled-PM2.5 using MERRA-2 aerosol components and meteorological parameters. The results showed an R-2 value increase from 0.42 to 0.62 after accounting for meteorological parameters in the ML models. These findings highlight the ineptness of conventional MERRA2-EE-PM2.5 and emphasize the need for customized regional models that assign appropriate weights to each aerosol component according to specific climate conditions. Our findings support the potential of machine learning as a tool to refine PM2.5 prediction, facilitating more accurate health risk assessments and policy-making for air quality management in future studies.