Lithological characterization plays a crucial role in geological studies, and outcrops serve as the primary source of geological information. Automatic identification of lithologies enhances geological mapping and reduces costs and risks associated with mapping less accessible outcrops. These outcrops are typically imaged using remote sensing techniques, enabling the identification of geological structures and lithologies through computer vision and machine learning (ML) approaches. In this context, convolutional neural networks (CNNs) have significantly contributed to lithological characterization in outcrop images. Recent advancements include novel architectures based on residual and attention blocks, which improve upon base CNN models. In addition, transformer-based architectures have surpassed CNNs in various tasks. Taking a step further, we propose a novel architecture that incorporates Fourier operators. Our proposed architecture builds upon the Transformer model, utilizing a sequential combination of Fourier neural operators (FNOs) and channelwise self-attention layers. To train our model, we adopt a transfer learning strategy, initially training it on a texture dataset with 47 classes. Subsequently, we fine-tune the same model to classify five specific lithologies in our custom dataset. These lithologies include sandstone, gray and brownish-gray shale, limestone, and laminated limestone images from the Tres Irm & atilde;os quarry within the Araripe Basin-an outcrop analogous to oil exploration reservoirs. The proposed architecture achieved an F1-score of up to 98%, performing better than reference CNN and ResNet models. This advancement holds promise for accurate lithological characterization, benefiting geological research and exploration efforts.