Eight independent variables (differential item functioning [DIF] detection method, purification procedure, item response model, mean latent trait difference between groups, test length, DIF pattern, magnitude of DIF, and percentage of DIF items) were manipulated, and two dependent variables (Type I error and power) were assessed through simulations. Results showed that it was the average signed area between the two item characteristic curves of the reference and focal groups rather than the percentage of DIF items in the test that determined the Type I error of the Mantel and generalized Mantel-Haenszel (GMH) methods. As long as the average signed area approached zero, both methods maintained control over their Type I error, even when the percentage of DIF items was very high (e.g., 40%). The Mantel yielded higher power than the GMH under all but the balanced DIF patterns. The two-stage Mantel is recommended if a single method is to be used.