The student academic performance prediction is becoming an indispensable service in the computer sup-ported intelligent education system. But conventional machine learning-based methods can only exploit the sparse discriminative features of student behaviors in imbalanced academic datasets to predict stu-dent academic performance (SAP). Furthermore, there is a lack of imbalanced data processing mecha-nisms that can efficiently capture student characteristics and achievement. Therefore, we propose a com-prehensive and high-performance prediction framework to probe SAP characteristics (ProbSAP) on mas-sive educational data, which can resolve imbalanced data issue and improve academic prediction perfor-mance for making course final mark prediction. It consists of three main components: collaborative data processing module for enhancing the data quality, scalable metadata clustering module for alleviating the imbalance of academic features, and XGBoost-enhanced SAP prediction module for academic performance forecasting. The collaborative data processing module integrates multi-dimensional academic data, which sustains a good supply for clustering and modeling in the ProbSAP framework. The comparative eval-uation results demonstrate that ProbSAP delivers superior accuracy and efficiency improvement for the course final mark prediction of college students over other state-of-the-art methods such as CNN, SVR, RFR, XGBoost, Catboost-SHAP, and AS-SAN. On average, ProbSAP reduces the mean absolute error (MAE) by 84.76%, 72.11%, and 66.49% compared with XGBoost, Catboost-SHAP, and AS-SAN, respectively. It also leads to a better out-sample fit that minimizes prediction errors between 1% and 9% with over 98% of actual samples.(c) 2023 Elsevier Ltd. All rights reserved.