With the advance of Convolutional Neural Network, deep hashing methods have shown the great promising performance in large-scale image retrieval. Without depending on extensive human-annotated data, unsupervised hashing is more applicable to image retrieval tasks compared to supervised methods. However, due to the lack of fine-grained supervised signals and multi-similarity constraints, most state-of-the-art unsupervised deep hashing algorithms cannot ensure the correct fine-grained similarity ranking for image pairs. In this paper, we propose a novel unsupervised deep multi-similarity hashing framework to learn compact binary codes by jointly exploiting global-aware and spatial-aware representations, called Unsupervised Deep Multi-Similarity Hashing with Semantic Structure (UDMSH). Specifically, to obtain distinguishing characteristics, we develop a sub-network by jointly learning global semantic structures from Convolutional Neural Network (CNN) and inherent spatial structures from Fully Convolutional Network (FCN). By computing the cosine distance for deep features from image pairs, we construct a similarity matrix with semantic structure, then utilize this matrix to guide hash code learning process. Based on it, we carefully design a multi-level pairwise loss to preserve the correct fine-grained similarity ranking. Furthermore, we introduce Hamming-isometric mapping into unsupervised hashing framework to decrease the quantization errors. Extensive experiments on three widely used benchmarks prove that our proposed UDMSH outperforms several state-of-the-art unsupervised hashing with respect to different evaluation metrics.