Bidirectional Encoder Representations from Transformers (BERT), a revolutionary model in natural language processing (NLP), has significantly impacted text-related tasks including text classification. Several BERT models were developed for Arabic language. While many studies have compared their overall performance for text classification tasks, none has dug deeper and investigated the relationship between their pretraining data and their performance. This study investigates this relationship by utilizing ten models and evaluating them on eight diverse classification tasks using metrics such as accuracy and F1 score. Results revealed variations in performance across tasks which was mainly due to the models pretraining corpora. The study emphasizes the impact of pretraining data size, quality, and diversity on model adaptability. Models pretrained on specific corpora, despite larger sizes, may not outperform those pretrained on more diverse datasets. Notably, domain-specific tasks, such as medical and poetry classification, unveiled performance gaps compared to the original English BERT. The findings suggest the necessity of reevaluating the pretraining approach for Arabic BERT models. Balancing quantity and quality in pretraining corpora, spanning various domains, is identified as crucial. The study provides insights into optimizing pretraining strategies for enhanced performance and adaptability of Arabic BERT models in diverse text classification tasks, offering valuable guidance for researchers and practitioners in the field of NLP. © The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2024.