Stance detection is an important task in opinion mining, which aims to determine whether the author of a text is in favor of, against, or neutral towards a specific target. By now, the scarcity of annotations is one of the remaining problems in stance detection. In this paper, we propose a Stance-Emotion joint Data Augmentation with Gradual Prompt-tuning (SEGP) model to address this problem. In order to generate more training samples, we propose an auxiliary sentence based Stance-Emotion joint Data Augmentation (SEDA) method, formulate data augmentation as a conditional masked language modeling task. We leverage different relations between stance and emotion to construct auxiliary sentences. SEDA generates augmented samples by predicting the masked words conditioned on both their context and auxiliary sentences. Furthermore, we propose a Gradual Prompt-tuning method to make better use of the augmented samples, which is a combination of prompt-tuning and curriculum learning. Specifically, the model starts by training on only original samples, then adds augmented samples as training progresses. Experimental results show that SEGP significantly outperforms the state-of-the-art approaches.