In recent years, there has been a lot of research on estimation of crowd density using deep learning techniques, with applications in public safety, crowd control, and video surveillance. With a particular emphasis on video surveillance, this review paper examines the approaches and uses of deep learning techniques for crowd density estimation. An overview of the various approaches to crowd density estimation-such as multi-stage models, multi-scale strategies, attention mechanisms, and multi-feature fusion-is given in this review. It also reviews the various benchmark datasets that have been used to evaluate the performance of these methods, such as ShanghaiTech, UCF CC 50, and UCF-QNRF (Saleh et al. in Eng Appl Artif Intell 41:103-114, 2015), Visdrone. Additionally, a review of recent advances in crowd behavior analysis using deep learning techniques explores the challenges and limitations of current crowd density estimation models. The potential future directions for research in this field, including the use of unmanned aerial vehicles (UAVs) for crowd surveillance, and the development of content-aware density maps for improved accuracy, are also identified.