In this paper we propose a new criterion for choosing between a pair of classification systems of science that assign publications (or journals) to a set of scientific fields. Consider the standard normalization procedure in which field mean citations are used as normalization factors. We recommend system A over system B whenever the standard normalization procedure based on A performs better than the when it is based on B. Since the evaluation can be made in terms of either system, the performance assessment requires a double test. In addition, since the assessment of two normalization procedures would be generally biased in favor of the one based on the classification system used for evaluation purposes, ideally a pair of classification systems must be compared using a third, independent classification system for evaluation purposes. We illustrate this strategy by comparing a Web of Science journal-level classification system, consisting of 236 journal subject categories, with two publication-level algorithmically constructed classification systems consisting of 1,363 (G6) and 5,119 (G8) clusters. There are two main findings. (1) The G8 system is found to dominate the G6 system. Therefore, when we have a choice between two classification systems at different granularity levels, we should use the system at the higher level because it typically exhibits a better standard normalization performance. (2) The G8 system and the Web of Science (WoS) journal-level system are found to be non-comparable. Nevertheless, the G8-normalization procedure performs better using the WoS system for evaluation purposes than the WoS-normalization procedure using the G8 system for evaluation purposes. Furthermore, when we use the G6 system for evaluation purposes, the G8-normalization procedure performs better than the WoS-normalization procedure. We conclude that algorithmically constructed classification systems constitute a credible alternative to the WoS system and, by extension, to other journal-based classification systems.