Combining RGB images and their corresponding depth maps in semantic segmentation has proven to be effective in recent years. However, existing RGB-D modal fusion methods either lack non-linear feature fusion abilities or treat both modal images equally, disregarding the intrinsic distribution gap and information loss. In this study, we have observed that depth maps are well-suited for providing fine-grained patterns of objects due to their local depth continuity, while RGB images effectively offer a global view. Based on this observation, we propose a novel module called the pixel Differential Convolution Attention (DCA) module, which takes into account geometric information and local-range correlations for depth data. Additionally, we extend the DCA module to create the Ensemble Differential Convolution Attention (EDCA), which propagates long-range contextual dependencies and seamlessly incorporates spatial distribution for RGB data. The DCA and EDCA modules dynamically adjust convolutional weights based on pixel differences, enabling self-adaptation in the local and long-range contexts, respectively. We construct a two-branch network, named the Differential Convolutional Network (DCANet), using the DCA and EDCA modules to fuse the local and global information from the two-modal data. Asa result, the individual advantages of RGB and depth data are emphasized. Experimental results demonstrate that our DCANet achieves anew state-of-the-art performance for RGB-D semantic segmentation on two challenging benchmark datasets: NYUv2 and SUN-RGBD.