Breast cancer (BC) is one of the most fatal diseases that have a profound impact on women. If the cancer is identified earlier, the proper treatment will be provided to the patients to decrease the possibility of death. Mammography is a widely used imaging modality to detect BC earlier, providing valuable information to radiologists to offer better treatment plans and outcomes. This article proposes an efficient BC detection system from mammographic images using a hyperparameter-tuned gated recurrent unit (HTGRU) with attention included in a pre-trained model. The system includes the following steps: preprocessing, segmentation, feature extraction, and classification. The proposed system performs preprocessing using Gaussian filtering and contrast-limited adaptive histogram equalization (CLAHE) for noise removal and contrast enhancement. The data augmentation is performed on the preprocessed dataset to balance the data samples of the benign and malignant classes that prevents the network form biased results. After that, a deviation theory-based fuzzy c-means (DTFCM) algorithm is utilized to segment the tumor regions from the preprocessed image. Then, the most discriminant features are extracted from the segmented tumor regions using a normalization-based attention module incorporated in the capsule network (NAMCN). Finally, HTGRU is used for classification, classifying the data into benign, malignant, and normal. The system is evaluated by the Mammographic Image Analysis Society (MIAS) and curated breast imaging subset of a digital database for screening mammography (CBIS-DDSM) datasets, and the outcomes demonstrate the proposed method's superiority over existing methods by achieving higher detection accuracy and lower false positive rates.