The consequential aspect of validity interprets the real and potential consequences of a test score, particularly when it comes to sources of invalidity related to the conceptions of fairness, bias, injustice, and inequity. Differential Item Functioning (DIF) analyzes the test items to evaluate test fairness and validity of educational tests. Besides, gender is mentioned as one of the elements that frequently acts as a source of construct-irrelevant variance. If gender imposes a large influence on the test items, it will bring about bias. In an attempt to explore validity and DIF analysis, the present study explores the validity of a high-stakes test and considers the role of gender as a source of bias in different subtests of language proficiency tests. To achieve this, the Rasch model was used to inspect biased items and to examine the construct-irrelevant factors. To obtain DIF analysis, the Rasch model was run to 5000 participants who were selected randomly from a pool of examinees taking part in the National University Entrance Exam for Foreign Languages (NUEEFL) as a university entrance requirement for English language studies (i.e., English literature, Teaching, and Translation). The findings reveal that the test scores are not free from construct-irrelevant variance and some misfit items were modified based on the fit statistics suggestions. By and large, the fairness of the NUEEFL was not confirmed. The results obtained from such psychometric assessment could be beneficial for test designers, stake-holders, administrators, as well as teachers. It also recommends the future administering standard and bias-free test and instructional materials.