Hyperspectral image (HSI) and multispectral image (MSI) fusion, denoted as HSI-MSI fusion, involves merging a pair of HSI and MSI to generate a high spatial resolution HSI (HR-HSI). The primary challenge in HSI-MSI fusion is to find the best way to extract 1-D spectral features and 2-D spatial features from HSI and MSI and harmoniously combine them. In recent times, coupled tensor decomposition (CTD)-based methods have shown promising performance in the fusion task. However, the tensor decompositions (TDs) used by these CTD-based methods face difficulties in extracting complex features and capturing 2-D spatial features, resulting in suboptimal fusion results. To address these issues, we introduce a novel method called coupled tensor double-factor (CTDF) decomposition. Specifically, we propose a tensor double-factor (TDF) decomposition, representing a third-order HR-HSI as a fourth-order spatial factor and a third-order spectral factor, connected through a tensor contraction. Compared to other TDs, the TDF has better feature extraction capability since it has a higher order factor than that of HR-HSI, whereas the other TDs only have the same order factor as the HR-HSI. Moreover, the TDF can extract 2-D spatial features using the fourth-order spatial factor. We apply the TDF to the HSI-MSI fusion problem and formulate the CTDF model. Furthermore, we design an algorithm based on proximal alternating minimization (PAM) to solve this model and provide insights into its computational complexity and convergence analysis. The simulated and real experiments validate the effectiveness and efficiency of the proposed CTDF method.