The passive seismic interferometry (SI), harnessing ambient noise or unconventional seismic sources, has garnered widespread attention in the fields of Earth science and resource exploration. Conventional SI requires several assumptions to be satisfied, including uniform distribution of subsurface sources, an adequate number of sources, and long recording periods. However, these assumptions often fall short in real-world scenarios, leading to suboptimal reconstruction quality and subsequently impacting imaging results. Therefore, we propose a passive SI method with deep transfer learning. This method can extract real-time empirical Green's functions directly from noisy datasets without prior preprocessing. Importantly, this technique goes beyond simple data retrieval; it demonstrates the ability to accurately reconstruct the entire wavefield. We establish a joint transformer-CNN network and conduct supervised training on intricate velocity models. Subsequently, we employ transfer learning to fine-tune the model, adapting it to new data that differ from the training dataset. Notably, our method requires only a small amount of data and can be applied to other velocity models without additional training for new neural networks. The validity of our method is demonstrated through a series of numerical experiments. Compared to conventional methods, real-time passive SI offers greater efficiency and accuracy in reconstructing subsurface structural response.