Converging evidence has shown that human object recognition depends on the observers' familiarity with objects' appearance. The more similar the objects are, the stronger this dependence will be, and the more important two-dimensional (2D) image information will be to discriminate these objects from one another. The degree to which 3D structural information is used, however, still remains an area of strong debate. Previously, we showed that all models that allow rotations in the image plane of independent 2D templates could not account for human performance in discriminating novel object views as a result of 3D rotation. We now present results from models of generalized radial basis functions (GRBF), 2D closest template matching that allows 2D affine transformations of independent 2D templates,(-)and Bayesian statistical estimator that integrates over all possible 2D affine transformations. The performance of the human observers relative to each of the models is better for the novel views than for the learned template views, this implies that human observers generalize to novel views from learned views better than the models do. The Bayesian estimator yields provably the optimal performance among all models of 2D affine transformations with independent 2D templates. Therefore, no models of 2D affine operations with independent 2D templates account for the human observers' performance. We suggest that the human observers used 3D structural information of the objects, which is also supported by the improved performance as the objects' 3D structural regularity increases.