Self-multiplier or square calculator design using the Vedic formulas is the new trend in quantum-dot cellular automata (QCA) technology. However, an efficient coplanar design and a complete performance analysis are still desired. This brief presents the coplanar QCA architecture of a 2-bits square calculator (proposed design-1 or PD1) using the Vedic sutra 'Urdhva Tiryagbhyam'. Furthermore, based on the E-shaped XOR gate and majority gate (MV) an optimized architecture (proposed design-2 or PD2) is presented. The PD2 architecture exhibits notable improvement compared to the previous architecture. The proposed PD2 requires 17%, 53%, and 25% fewer cells, smaller area, and lower latency, respectively. Likewise, the extended design for 4-bits architecture (proposed design-3 or PD3) achieves 67%, 63%, and 62% superiority in cell count, covered area, and latency, respectively. Compare to the best previous design, the area-delay, QCA-specific, and energy-delay costs for PD2 (PD3) are lower by a factor of similar to 10 (similar to 30), similar to 71 (similar to 33), and similar to 64 (similar to 83), respectively. Moreover, there is an improvement in terms of power dissipation as the QCA based designs PD2 (PD3) dissipates 1.41x10(-6) mW (11.82x10(-6) mW), whereas the similar type CMOS-based designs dissipate 2.29x10(-2) mW (48x10(-2) mW), respectively. It is worth mentioning that the comprehensive performance analyses are carried out using the QCADesigner, QCADesigner-E, and QCAPro tools.