Shape and spirit similarity are two kinds of common artistic modes in concept visualization. The adoption depends on the designers' subjective preference and judgment, which may cause potential risks for semantic communication. This article used pairs of real image-concrete word as the roots, and contrasted four kinds of multimodal mappings such as shape similarity-concrete concept, shape similarity-abstract concept, spirit similarity-concrete concept, and spirit similarity-abstract concept to compare the matching difference through the S1(picture)-S2(word) paradigm. The behavioral results showed that shape similarity had advantages in both matching rate and reaction time over spirit similarity, but the difference was more significant to the concrete word than to the abstract word. The ERPs showed that the N1, P2, and N400 components had alike effects with the behavioral results, but the mappings of spirit similarity-concrete concept elicited the largest positivity of P600, suggesting the complicated mechanisms of semantic integration and concreteness effect in the multimodal mappings. This study proves that the concrete concept should be visualized according to its appearance, not the most striking feature or function; but the visulization of abstract concept shows less difference after a concreteness transition.