This paper presents a new approach of combining real video and synthetic objects. The purpose of this work is to use the proposed technology in the fields of advanced animation, virtual reality, games, and so forth. Computer graphics has been used in the fields previously mentioned. Recently, some applications have added real video to graphic scenes for the purpose of augmenting the realism that the computer graphics lacks in. This approach called augmented or mixed reality can produce more realistic environment than the entire use of computer graphics. Our approach differs from the virtual reality and augmented reality in the manner that computer-generated graphic objects are combined to 3-D structure extracted from monocular image sequences. The extraction of the 3-D structure requires the estimation of 3-D depth followed by the construction of a height map. Graphic objects are then combined to the height map. The realization of our proposed approach is carried out in the following steps: (1) We derive 3-D) structure from test image sequences. The extraction of the 3-D structure requires the estimation of depth and the construction of a height map. Due to the contents of the test sequence, the height map represents the 3-D structure. (2) The Fright map is modeled by Delaunay triangulation or Bezier surface and each planar surface is texture-mapped (3) Finally, graphic objects are combined to the height map. Because 3-D structure of the height map is already known, Step (3) is easily manipulated. Following this procedure, we produced an animation video demonstrating the combination of the 3-D structure and graphic models. Users can navigate the realistic 3-D world whose associated image is rendered on the display monitor.