Deep-learning models have powerful capabilities for nonlinear modeling and addressing long-term dependencies, so they are widely employed to model bearing degradation and predict the remaining useful life. Effective utilization of a deep-learning model relies on two primary factors: the construction of a well-crafted network architecture and the integration of abundant domain-specific knowledge. However, existing deep network architectures have limited capabilities for capturing the intricate probabilistic generative behaviors underlying bearing degradation. Moreover, such networks often neglect domain-specific knowledge related to mechanical degradation and faults. Consequently, these factors constrain and influence the performance of the deep network considerably. Expanding on this foundation, this study proposes a novel deep state-space model that leverages variational autoencoders (VAEs) and state-space models to enhance the uncertainty representation capability of deep networks. And, by incorporating planar flows (PFs), it relaxes the Gaussian assumption in traditional models, thereby enhancing the model's generalizability. On this basis, a pretraining approach that incorporates domain knowledge into the prior weights of the predictive model is developed. This approach reduces the dependence of the deep network on extensive sample data. Experimental results on bearing datasets from real wind turbine bearing data demonstrate the significant advantages of the proposed method over traditional approaches, showcasing its potential for more accurate and robust predictions of bearing degradation.