In recent years, blended learning has been widely applied in universities, introducing complex and diverse learning data. This study aims to use machine learning algorithms to extract useful information from this data for early student performance prediction. There are still some problems in current related research, including the neglect of short text data for online learning, data imbalance, and insufficient utilisation of multimodal data. To address the mentioned issues, this study proposes an innovative solution. Firstly, adjusting the generative adversarial network generator’s objective function solves data imbalance in student performance prediction, and the prediction ability for minority-category students is improved. Secondly, using short text data from online learning to map the emotions of student learning states and enhance the model’s accuracy and generalisation ability. Finally, this study introduces a multimodal generative adversarial network performance prediction model, which achieves the fusion of multimodal data, improves the accuracy and comprehensibility of prediction. © 2025 Inderscience Enterprises Ltd.