The objective of infrared and visible image fusion is to generate a unified image that highlights prominent targets and retains intricate texture details, even in scenarios with imbalanced source image information. However, current image fusion algorithms primarily consider factors like illumination, restricting their applicability to certain scenes and compromising their adaptability. To tackle the issue, this paper proposes the DGFusion, which utilizes TWSSLoss to balance the contribution of source images in the fused output, effectively mitigating the limitations linked to relying solely on illumination guidance. Additionally, modality- complement feature attention harmonizer (MCFAH) facilitates cross-modal channel attention learning. This process assigns weights to features and accomplishes fusion by exchanging cross-modal differential information, thereby enriching each feature with details from the other modality. Furthermore, the multi convolution attentive net (MCAN) dynamically adjusts the contributions of features from different modalities. It prioritizes the most expressive characteristics to accentuate complementary information, enabling efficient fusion. In conclusion, our method outperforms seven state-of-the-art alternatives in terms of preserving target details and retaining texture information. Rigorous generalization experiments across five diverse datasets demonstrate the robustness of our DGFusion model in various scenarios.