In this paper(1), we extend Bayesian-Kullback YING-YANG (BKYY) learning into a much broader Bayesian Ying-Yang (BYY) learning System via using different separation functionals instead of using only Kullback Divergence, and elaborate the power of BYY learning as a general learning theory for parameter learning, scale selection, structure evaluation, regularization and sampling design, with its relations to several existing learning methods and its developments in the past years briefly summarized. Then, we present several new results on BYY learning. First, improved criteria are proposed for selecting number of densities on finite mixture and gaussian mixtures, for selecting number of clusters in MSE clustering and for selecting subspace dimension ist PCA related methods. Second, improved criteria are proposed for selecting number of expert nets in mixture of experts and its alternative model and selecting number of basis functions in RBF nets. Third, three categories of Non-Kullback separation functionals namely Convex divergence, L-p divergence and Decorrelation index, are suggested for BYY learning as alternatives for those learning models based on Kullback divergence, with some interesting properties discussed. As examples, the EM algorithms for finite mixture, mixture of experts and its alternative model are derived with Convex divergence.