Increasingly, machines are interacting with people through human action recognition from video streams. Video data can naturally be represented as a third-order data tensor. Although many tensor-based approaches have been proposed for action recognition, the geometry of the tensor space is seldom regarded as an important aspect. In this paper, we stress that a data tensor is related to a tangent bundle on a special manifold. Using a manifold charting, we can extract discriminating information between actions. Data tensors are first factorized using high-order singular value decomposition, where each factor is projected onto a tangent space and the intrinsic distance is computed from a tangent bundle for action classification. We examine a standard manifold charting and some alternative chartings on special manifolds, particularly, the special orthogonal group, Stiefel manifolds, and Grassmann manifolds. Because the proposed paradigm frames the classification scheme as a nearest neighbor based on the intrinsic distance, prior training is unnecessary. We evaluate our method on three public action databases including the Cambridge gesture, the UMD Keck body gesture, and the UCF sport datasets. The empirical results reveal that our method is highly competitive with the current state-of-the-art methods, robust to small alignment errors, and yet simpler.