The existing deep convolutional neural network (DCNN) models used for hand gesture recognition based on surface electromyography (sEMG) require high computational costs. Moreover, there is a lack of a comprehensive DCNN model that can handle both high-definition sEMG and low-definition sEMG in a subject-independent manner. To address these issues, this study proposes a lightweight convolutional neural network (CNN) model for sEMG-based subject-independent hand gesture recognition evaluated in high-density sEMG (HD-sEMG) and low-density sEMG (LD-sEMG). In addition, we add a technique, joint classification with averaging probability (JCAP), to enhance the final recognition accuracy with less computational costs. We conducted three experiments (Exp-I-III). Exp-I: optimization of the proposed model; Exp-II: comparison with benchmark models; and Exp-III: evaluation of model performance on the simulated real-time scenario. For the results, our model achieved significantly better accuracy for all selected gestures, while computational complexity was considered low, measured via total parameters, inference time, floating-point operations (FLOPs), and selection time. In Exp-II, our best-proposed model from Exp-I got the highest accuracy at ISRMyo-I, 85.75%, 8x smaller in terms of the number of parameters and reduces more than 94.8% of FLOPs, whereas inference time is around 20% faster compared to the smallest and fastest baseline method, respectively. The selection time of our best-proposed model was more than 6x faster than the existing lightweight model in Exp-III. These strengths provide our model advantages in computational-resource-limited sEMG-based human-machine interface applications, such as edge computing, the future trend for consumer electronics.