Software-intensive systems have been growing in both size and complexity. Consequently, developers need better support for measuring and controlling the software quality. In this context, software metrics aim at quantifying different software quality aspects. However, the effectiveness of measurement depends on the definition of reliable metric thresholds, i.e., numbers that characterize a metric value as critical given a quality aspect. In fact, without proper metric thresholds, it might be difficult for developers to indicate problematic software components for correction, for instance. Based on a literature review, we have found several existing methods for deriving metric thresholds and observed their evolution. Such evolution motivated us to propose a new method that incorporates the best of the existing methods. In this paper, we propose a novel benchmark-based method for deriving metric thresholds. We assess our method, called Vale's method, using a set of metric thresholds derived with the support of our method, aimed at composing detection strategies for two well-known code smells, namely god class and lazy class. For this purpose, we analyze three benchmarks composed of multiple software product lines. In addition, we demonstrate our method in practice by applying it to a benchmark composed of 103 Java open-source software systems. In the evaluation, we compare Vale's method to two state-of-the-practice threshold derivation methods selected as a baseline, which are Lanza's method and Alves' method. Our results suggest that the proposed method provides more realistic and reliable thresholds, with better recall and precision in the code smell detection, when compared to both baseline methods.