Human-computer interaction (HCI) with three-dimensional (3D) Building Information Modelling/Model (BIM) is the crucial ingredient to enhancing the user experience and fostering the value of BIM. Current BIMs mostly use keyboard, mouse, or touchscreen as media for HCI. Using these hardware devices for HCI with BIM may lead to space constraints and a lack of visual intuitiveness. Somatosensory interaction represents an emergent modality of interaction, e.g., gesture interaction, which requires no equipment or direct touch, presents a potential approach to solving these problems. This paper proposes a computer-vision-based gesture interaction system for BIM. Firstly, a set of gestures for BIM model manipulation was designed, grounded in human ergonomics. These gestures include selection, translation, scaling, rotation, and restoration of the 3D model. Secondly, a gesture understanding algorithm dedicated to 3D model manipulation is introduced in this paper. Then, an interaction system for 3D models based on machine vision and gesture recognition was developed. A series of systematic experiments are conducted to confirm the effectiveness of the proposed system. In various environments, including pure white backgrounds, offices, and conference rooms, even when wearing gloves, the system has an accuracy rate of over 97% and a frame rate maintained between 26 and 30 frames. The final experimental results show that the method has good performance, confirming its feasibility, accuracy, and fluidity. Somatosensory interaction with 3D models enhances the interaction experience and operation efficiency between the user and the model, further expanding the application scene of BIM.