The RGB-D salient object detection algorithm simulates human attention behavior and attempts to locate the most visually prominent object(s) from a set of RGB and depth images. Existing works often follow a deterministic decoding network, with few methods explicitly considering how to establish connections between features at various levels. To this end, we first propose a cascaded refined RGB-D salient object detection network based on the attention mechanism (CRNet), whose primary contribution is a cascaded refined upsampling network layout. Specifically, we have developed an adaptive channel transformation ratio α in the micro modification module of convolutional block attention (MM), adaptively adjusting the feature channel conversion ratio according to the original input depth feature level to maximize the integration of contextual information during the feature extraction phase. For the multi-modal feature interaction section, we propose a contextual feature aggregation module (ACF) consisting of separable convolution, dilated convolution, and adaptive averaging pooling. Extend multi-modal fused features’ receptive fields, reduce redundant information, and decrease background noise interference. Furthermore, we first propose a cascaded refined upsampling network, a precise refining process that includes personal refinement, team expansion, and sequential execution operations. Among them, most of the actions were performed in a new sequential refinement module based on attention mechanism (SRM-Wm). We put the training of CRNet under the supervision of a new hybrid loss function. The experiment results show that the structure of our model is simple but very effective and outperforms the 19 SOTAs on six public datasets using four metrics (∼\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$\sim $\end{document}1.6% improvement in F-measure vs. the top-ranked model: BBSNet-TIP2021). You can find the code and results of our method at https://github.com/guanyuzong/CR-Net.