A theory of optimal Petrov-Galerkin, h-p version, finite element approximations is presented. The optimal scheme is defined relative to a fine mesh solution space and relative to an arbitrary symmetric bilinear form. The optimal method leads to a symmetric, positive-definite stiffness matrix which is independent of the coefficients of the given problem, exhibits 'extra superconvergence' properties, and has a relative error that can be calculated exactly, at each point in the problem domain. Various generalizations are also discussed, including the connection of these methods with certain preconditioning schemes.