By using a small amount of label information to achieve favorable performance, semi-supervised methods are more practical in real-world application scenarios. However, existing semi-supervised cross-modal retrieval methods mainly focus on preserving similarities and learning more consistent hash codes yet overlook the importance of constructing a joint abstract space shared by multi-modal embeddings. In this paper, we propose a novel Semi-supervised Cross-modal Hashing with Joint Hyperboloid Mapping (SCH-JHM). Firstly, we present a diffusion-based teacher model in SCH-JHM to learn the generalized semantic knowledge and output the pseudolabels for unlabeled data. Secondly, SCH-JHM establishes a five-tuple plane, resembling an hourglass, for each retrieval task based on the queries, positive pairs, negative pairs, semi-supervised positive pairs, and semisupervised negative pairs included in the semi-supervised cross-modal retrieval task. Furthermore, it projects the 12 tasks from the image, text, video, and audio modalities into a joint hyperboloid space. Finally, the student model in SCH-JHM is employed to explore the latent semantic relevance between filtered heterogeneous entities, which can be considered as a supervised process. Comprehensive experiments compared with state-of-the-art methods on three widely used datasets verify the effectiveness of our proposed approach.