Neuro-fuzzy systems combine the abilities of both artificial neural networks and fuzzy systems. They are easily trainable and provide a certain level of interpretability. Their performance has been assessed in different application domains and many attempts have been made to improve it using ensemble learning. However, to the best of our knowledge, no study has investigated the performance of heterogeneous neuro-fuzzy ensembles in a medical context. In this study, we constructed, evaluated, and compared the performance of 26 heterogeneous neuro-fuzzy ensembles on four medical datasets. The five single classifiers used were based on the Takagi-Sugeno-Kang (TSK) and Mamdani fuzzy inference systems. The metrics employed to measure the performance of the ensemble classifiers were the accuracy, precision, and recall. Additionally, the Borda count method and Scott-Knott statistical test were used to rank and cluster the clas-sifiers, respectively. The results show that ensemble classifiers rarely outperform their base classifiers. Moreover, ensembles composed of TSK base classifiers performed best. In addition, we noticed that ensembles comprising four base learners achieved the best performance. Finally, no ensemble classifier managed to score high-performance values across the four datasets.