The Dirichlet distribution is widely used in the context of mixture models. Despite its flexibility, it still suffers from some limitations, such as its restrictive covariance matrix and its direct proportionality between its mean and variance. In this work, a generalization over the Dirichlet distribution, namely the Nested Dirichlet distribution, is introduced in the context of finite mixture model providing more flexibility and overcoming the mentioned drawbacks, thanks to its hierarchical structure. The model learning is based on the generalized expectation-maximization algorithm, where parameters are initialized with the method of moments and estimated through the iterative Newton-Raphson method. Moreover, the minimum message length criterion is proposed to determine the best number of components that describe the data clusters by the finite mixture model. The Nested Dirichlet distribution is proven to be part of the exponential family, which offers several advantages, such as the calculation of several probabilistic distances in closed forms. The performance of the Nested Dirichlet mixture model is compared to the Dirichlet mixture model, the generalized Dirichlet mixture model, and the Convolutional Neural Network as a deep learning network. The excellence of the powerful proposed framework is validated through this comparison via challenging datasets. The hierarchical feature of the model is applied to real-world challenging tasks such as hierarchical cluster analysis and hierarchical feature learning, showing a significant improvement in terms of accuracy.