With the continual evolution of network technologies, the Internet of Things (IoT) has permeated various sectors of society. However, over the past decade, the annual discovery of cyberattacks has shown an exponential surge, inflicting severe damage to economic development. Aiming at the high false alarm rate, poor classification performance and overfitting problems in current intrusion detection systems, this paper proposes an efficient hierarchical intrusion detection model named ET-DCANET. Initially, the extreme random tree algorithm is employed for feature selection to meticulously curate the optimal feature subset. Subsequently, the dilated convolution and dual attention mechanism (including channel attention and spatial attention) are introduced, and a strategy of gradual transition from coarse-grained learning to fine-grained learning is proposed by gradually narrowing the expansion rate of cavity convolution, and the DCNN and dual attention modules are progressively refined to effectively utilize the synergy of DCNN and Attention to extract spatial and temporal features. This gradual transition from coarse-grained learning to fine-grained learning helps to better balance global and local information when dealing with complex data, and improves the performance and generalization ability of the model. To confront the class imbalance issue within the dataset, a novel loss function, EQLv2, is introduced as a substitute for the conventional cross-entropy (CE) loss. This innovation directs the model's focus toward minority class samples, ultimately enhancing the overall performance of the model. The proposed model shows excellent intrusion detection on the NSL-KDD, UNSW-NB15, and X-IIoTID datasets with accuracy rates of 99.68%, 98.50%, and 99.85%, respectively.