Facial expression is the most direct way for human to express emotion and intention, and facial expression recognition (FER) plays a crucial role in machine intelligence. Currently, two common ways to process images includes: viewing facial images as Euclidean structures in space and spliting the images into patches represented with a continuous sequence. However, faces in images are usually irregular, and are not flexible enough to be processed in the form of grid or sequence. Meanwhile, discrete features such as background tend to have a negative impact and interfere with FER when their features are extracted. To address the above problems, an attentional visual graph neural network based FER method (AT-ViG) is proposed in this paper. First, AT-ViG represents the input face images as graphs, and applies pixel-based composition methods to process the input images, to solve the problem of inflexibility data processing methods. Second, AT-ViG employs an attention mechanism to enhance feature extraction, to minimize the influence of irrelevant features such as background and hair in facial images. This further highlights the features that are relevant to FER and suppresses irrelevant features, where the model can effectively capture the correlations between different facial regions, reduce redundant information and improve the accuracy and robustness of expression recognition. Finally, the AT-ViG is experimentally validated on three commonly used expression recognition datasets and achieved the accuracies of 99.60%, 92.03%, 74.35% and 88.15% in CK+, Oulu-CASIA NIR &VIS, FER2013 and RAF-DB, respectively.