In this paper, hidden Markov models (HMM's) with bounded state durations (HMM/BSD's) are proposed to explicitly model the state durations of HMM's and more accurately consider the temporal structures existing in speech signals in a simple, direct but effective way. The state durations of HMM/BSD are simply lower and upper bounded by two bounding parameters for each state in the recognition phase, as compared to the approaches of using Poisson, gamma, or other distributions proposed previously to model state durations. These bounding parameters, on the other hand, can be estimated during the training phase. A series of experiments have been conducted for speaker dependent applications using all the 408 highly confusing first-tone Mandarin syllables as the example vocabulary. It was found that in the discrete case the recognition rate of HMM/BSD (78.5%) is 9.0%, 6.3%, and 1.9% higher than the conventional HMM's and HMM's with Poisson and gamma distributed state durations, respectively. In the continuous case (partitioned Gaussian mixture modeling), the recognition rates of HMM/BSD (88.3% with 1 mixture, 88.8% with 3 mixtures, and 89.4% with 5 mixtures) are 6.3%, 5.0%, and 5.5% higher than those of the conventional HMM's, and 5.9% (with 1 mixture), 3.9% (with 3 mixtures) and 3.1% (with 1 mixture), 1.8% (with 3 mixtures) higher than HMM's with Poisson and gamma distributed state durations, respectively. As to computation complexity and recognition speed, it turns out that the computation complexity required by the new modeling method proposed here is much less than that for HMM's with Poisson or gamma distributed state durations.