In recent years, simultaneous localization and mapping with the fusion of LiDAR and vision fusion has gained extensive attention in the field of autonomous navigation and environment sensing. However, its limitations in feature-scarce (low-texture, repetitive structure) environmental scenarios and dynamic environments have prompted researchers to investigate the use of combining LiDAR with other sensors, particularly the effective fusion with vision sensors. This technique has proven to be highly effective in handling a variety of situations by fusing deep learning with adaptive algorithms. LiDAR excels in complex environments, with its ability to acquire high-precision 3D spatial information, especially when dealing with complex and dynamic environments with high reliability. This paper analyzes the research status, including the main research results and findings, of the early single-sensor SLAM technology and the current stage of LiDAR and vision fusion SLAM. Specific solutions for current problems (complexity of data fusion, computational burden and real-time performance, multi-scenario data processing, etc.) are examined by categorizing and summarizing the body of the extant literature and, at the same time, discussing the trends and limitations of the current research by categorizing and summarizing the existing literature, as well as looks forward to the future research directions, including multi-sensor fusion, optimization of algorithms, improvement of real-time performance, and expansion of application scenarios. This review aims to provide guidelines and insights for the development of SLAM technology for LiDAR and vision fusion, with a view to providing a reference for further SLAM technology research. © 2025 by the authors.