Hydraulic tomography (HT) has emerged as a cost-efficient approach to infer the heterogeneity of geological media. The application of deep learning to hydraulic inverse problems has shown promising results, including approximating the inverse mapping from HT data to the image of hydraulic conductivity. However, most studies require the conversion of point-form HT data into images, regarding building inverse mapping as an image-image task. This necessitates data preprocessing, introducing human-induced errors. Besides, extracting features from images imposes a greater computational burden. To address these shortcomings, we proposed the utilization of sequence models to build the inverse mapping directly from observational data space to parameter space, thereby enhancing accuracy and reducing computational demands. An assessment was conducted on sequence models and image-to-image regression networks in a synthetic steady-state HT experiment. Comparative analyses were performed under different scenarios, including varying amounts of available data and data noise. Lastly, we applied our method in a synthetic transient HT experiment. Results showed that some sequence models, namely Gated Recurrent Unit (GRU), Long Short-Term Memory (LSTM), and Transformer have similar performance compared to the image-to-image regression networks, which are CNN2D and U-Net in this study, but the sequence models have lower computational costs significantly. The Transformer-based model outperforms its closest competitor, achieving an R2 of 0.9666 and an RMSE of 0.1467. It was also found that the Transformerbased model had greater interpretability by analyzing the attention score matrix. The application of our methods in the synthetic transient HT experiment demonstrated the flexibility of using sequence models. Hydrogeologists should prioritize the characteristics of available data when selecting between these two methods and note that data noise can significantly compromise the efficacy of both approaches.