Ethnicity information is an integral part of human identity, and a useful identifier for various applications ranging from video surveillance, targeted advertisement to social media profiling. In recent years, Convolutional Neural Networks (CNNs) have shown state-of-the-art performance in many visual recognition problems. Currently, there are a few CNN-based approaches on ethnicity classification [1], [2]. However, the approaches suffer from the following limitations: (i) most face datasets do not include ethnicity information, and those with ethnicity information are typically small to medium in size, thereby they do not provide sufficient samples for training of CNNs from the scratch, and (ii) the CNN methods often treat ethnicity classification as a multi-class classification where the likelihood of each class label is generated. However, it does not utilize the intermediate activation functions of CNNs which provide rich hierarchical features to assist in ethnicity classification. In view of this, this paper proposes a new hybrid supervised learning method to perform ethnicity classification that uses both the strength of CNN as well as the rich features obtained from the network. The method combines the soft likelihood of CNN classification output with an image ranking engine that leverages on matching of the hierarchical features between the query and dataset images. A supervised Support Vector Machine (SVM) hybrid learning is developed to train the combined feature vectors to perform ethnicity classification. The performance of the proposed method is evaluated using a dataset consisting of Bangladeshi, Chinese and Indian ethnicity groups, and it outperforms the state-of-the-art methods [2], [3] by up to 3% in recognition accuracy.