Background As visual simultaneous localization and mapping(SLAM) is primarily based on the assumption of a static scene, the presence of dynamic objects in the frame causes problems such as a deterioration of system robustness and inaccurate position estimation. In this study, we propose a YGC-SLAM for indoor dynamic environments based on the ORB-SLAM2 framework combined with semantic and geometric constraints to improve the positioning accuracy and robustness of the system. Methods First, the recognition accuracy of YOLOv5 was improved by introducing the convolution block attention model and the improved EIOU loss function, whereby the prediction frame converges quickly for better detection. The improved YOLOv5 was then added to the tracking thread for dynamic target detection to eliminate dynamic points. Subsequently, multi-view geometric constraints were used for re-judging to further eliminate dynamic points while enabling more useful feature points to be retained and preventing the semantic approach from over-eliminating feature points, causing a failure of map building. The K-means clustering algorithm was used to accelerate this process and quickly calculate and determine the motion state of each cluster of pixel points. Finally, a strategy for drawing keyframes with de-redundancy was implemented to construct a clear 3D dense static point-cloud map. Results Through testing on TUM dataset and a real environment, the experimental results show that our algorithm reduces the absolute trajectory error by 98.22% and the relative trajectory error by 97.98% compared with the original ORBSLAM2, which is more accurate and has better real-time performance than similar algorithms, such as Dyna SLAM and DS-SLAM. Conclusions The YGC-SLAM proposed in this study can effectively eliminate the adverse effects of dynamic objects, and the system can better complete positioning and map building tasks in complex environments.