Big data has many divergent types of sources, from physical (sensor/loT) to social and cyber (web) types, rendering it messy, imprecise, and incomplete. Due to its quantitative (volume and velocity) and qualitative (variety) challenges, big data to the users resembles something like "the elephant to the blind men". It is imperative to enact a major paradigm shift in data mining and learning tools so that information from diversified sources must be integrated together to unravel information hidden in the massive and messy big data, so that, metaphorically speaking, it would let the blind men "see" the elephant. This talk will address yet another vital "V"-paradigm: "Visualization". Visualization tools are meant to supplement (instead of replace) the domain expertise (e. g. a cardiologist) and provide a big picture to help users formulate critical questions and subsequently postulate heuristic and insightful answers. For big data, the curse of high feature dimensionality is causing grave concerns on computational complexity and over-training. In this talk, we shall explore various projection methods for dimension reduction - a prelude to visualization of vectorial and non-vectorial data. A popular visualization tool for unsupervised learning is Principal Component Analysis (PCA). PCA aims at best recoverability of the original data in the Euclidean Vector Space (EVS). We shall propose a supervised PCA Discriminant Component Analysis (DCA) - in a Canonical Vector Space (CVS). Simulations confirm that DCA far outperforms PCA, both numerically and visually. More importantly, via proper interplay between anti-recoverability in EVS and discriminant power in CVS, DCA is promising for privacy protection when personal data are being shared on the cloud in collaborative learning environments. We shall extend PCA/DCA to kernel PCA/DCA for the purpose of visualizing nonvectorial data. The success of kernel methods depend critically on which kernel function is used to represent the similarity of a pair of objects. For visualization of nonvectorial and incompletely specified data, our experimental study points to a promising application of multi-kernels, including an imputed Gaussian RBF kernel and a partial correlation kernel.