The paper discusses several empirical studies reported in the literature aimed at evaluating the benefits of using software engineering methods and tools. The discussion highlights a number of problems associated with the methodology of the studies. The main problems concerned the difficulty of formulating the hypothesis to be tested, using surrogate measures, defining a control and minimising the effect of personalities. Most of these problems are found in many experimental situations, but the problem associated with the proper definition of a control group seems to be a particular issue for software experiments. The paper concludes with some guidelines for improving the organisation of empirical studies.