The fusion of multi-modal data, e.g., pathology slides and genomic profiles, can provide complementary information and benefit glioma grading. However, genomic profiles are difficult to obtain due to the high costs and technical challenges, thus limiting the clinical applications of multi-modal diagnosis. In this work, we address the clinically relevant problem where paired pathology-genomic data are available during training, while only pathology slides are accessible for inference. To improve the performance of pathological grading models, we present a discrepancy and gradient-guided distillation framework to transfer the privileged knowledge from the multi-modal teacher to the pathology student. For the teacher side, to prepare useful knowledge, we propose a Discrepancy-induced Contrastive Distillation (DC-Distill) module that explores reliable contrastive samples with teacher-student discrepancy to regulate the feature distribution of the student. For the student side, as the teacher may include incorrect information, we propose a Gradient-guided Knowledge Refinement (GK-Refine) module that builds a knowledge bank and adaptively absorbs the reliable knowledge according to their agreement in the gradient space. Experiments on the TCGA GBM-LGG dataset show that our proposed distillation framework improves the pathological glioma grading significantly and outperforms other KD methods. Notably, with the sole pathology slides, our method achieves comparable performance with existing multi-modal methods. The code is available at https://github.com/CityU-AIM-Group/MultiModal-learning.