Visible-infrared person re-identification (VI-reID) is a complex task insecurity and video surveillance that aims to identify and match a person captured by various non-overlapping cameras. In recent years, there has been a notable advancement in reID owing to the development of transformer-based architectures. Although many existing methods emphasize on learning both modality-specific and shared features, challenges remain in fully exploiting the complementary information between infrared and visible modalities. Consequently, there is still opportunity to increase retrieval performance by effectively comprehending and integrating cross- modality semantic information. These designs often have problems with model complexity and time-consuming processes. To tackle these issues, we employ a novel transformer-based neural architecture search (TNAS) deep learning approach for effective VI-reID. To alleviate modality gaps, we first introduce a global-local transformer (GLT) module that captures features at both global and local levels across different modalities, contributing to better feature representation and matching. Then, an efficient neural architecture search (NAS) module is developed to search for the optimal transformer-based architecture, which further enhances the performance of VI-reID. Additionally, we introduce distillation loss and modality discriminative (MD) loss to examine the potential consistency between different modalities to promote intermodality separation between classes and intramodality compactness within classes. Experimental results on two challenging benchmark datasets illustrate that our developed model achieves state-of-the-art results, outperforming existing VI-reID methods.