Facial expression recognition (FER) is a tedious task in image processing for complex real-world scenarios that are captured under different lighting conditions, facial obstructions, and a diverse range of facial orientations. To address this issue, a novel Twinned attention network (Twinned-Att) is proposed in this paper for an efficient FER in occluded images. The proposed Twinned-Att network is designed in two separate modules: Holistic module (HM) and landmark centric module (LCM). The holistic module comprises of dual coordinate attention block (Dual-CA) and the Cross Convolution block (Cross-conv). The Dual-CA block is essential for learning positional, spatial, and contextual information by highlighting the most prominent characteristics in the facial regions. The Cross-conv block learns the spatial inter-dependencies and correlations to identify complex relationships between various facial regions. The LCM emphasizes smaller and distinct local regions while maintaining resilience against occlusions. Vigorous experiments have been undertaken to improve the efficacy of the proposed Twinned-Att. The results produced by the Twinned-Att illustrate the remarkable responses which achieve the accuracies of 86.92%, 85.64%, 78.40%, 69.82%, 64.71%, 85.52%, and 85.83% for the datasets viz., RAF DB, FER PLUS, FER 2013, FED RO, SFEW 2.0, occluded RAF DB and occluded FER Plus respectively. The proposed Twinned-Att network is experimented with various backbone networks, including Resnet-18, Resnet-50, and Resnet-152. It consistently outperforms well and highlights its prowess in addressing the challenges of robust FER in the images captured in complex real-world environments.