Human activity recognition (HAR) entails analyzing and interpreting data to infer human activity accurately. Convolution neural network deep learning techniques detect and classify human activity. However, convolutional layers in deep learning models typically have many parameters and floating-point operations per second, posing a challenge for real-time inference on Internet of Things (IoT) devices suitable for HAR due to their continuous data collection. This study addresses this problem by introducing a lightweight, depthwise residual network squeeze-and-excitation (ResNet-SE) model for HAR. The proposed model independently considers the spatial and channel data characteristics by employing depthwise convolutions, enabling efficient calculations. Extensive performance evaluation experiments were conducted on three public datasets for HAR (i.e., WISDM, UCIHAR, and PAMAP2). The best results surpassed those of state-of-the-art models in HAR, revealing accuracy values of 0.945 with 61,298 parameters and a 3.54-second inference time on the WISDM dataset, 0.997 with 60,134 parameters and a 0.47-second inference time on the UCI-HAR dataset, and 0.974 with 61,004 parameters and a 0.347-second inference time on the PAMAP2 dataset. The proposed model trained on the PAMAP2 dataset was deployed in an IoT device environment, and tests were conducted using experimental data. The results demonstrate that the proposed model exhibits fast inference times and lower energy consumption, and CPU use even in IoT devices. It achieves higher accuracy with actual data, highlighting its suitability for IoT environments. The results demonstrate that the proposed lightweight and highly practical model displays superior activity detection capabilities compared to existing models.