Multimodal Sentiment Analysis (MSA) aims to analyze the attitudes of speakers from video content. Previous methods focus on exploring consistent and cross-modal sentiment representations by multimodal interactions, they treat each modality equally. However, modalities are incomplete and uncertain, e.g., noise, semantic ambiguity. Modality with low uncertainty contributes more to the final loss, suppressing the optimization of modalities with high uncertainties. To address this problem, we propose a new Uncertainty-aware Gradient modulation and Feature masking model (UGF) for MSA, which aims to assist optimization of modalities with high uncertainty. We propose a novel modal uncertainty estimation method, which considers both the intra- and inter-modality consistency to estimate modal uncertainty. We improve the model by two aspects: First, we design a dynamic gradient modulation module (DGM) to amend the optimization process of each modality, it dynamically modulates the gradients of modality encoders according to their uncertainties. Second, we propose a uncertainty guided feature masking (UFM), it adaptively adds noise to the deterministic modality, making model pay more attention on uncertain modalities. We conducted extensive experiments on three popular datasets, e.g., MOSI, MOSEI and CH-SIMS. Experimental results show that our propose UGF achieves competitive results, the ablation studies demonstrate the effectiveness of the proposed components.