We propose a low-complexity joint channel estimation (CE) and three-stage iterative demapping-decoding scheme for near-capacity coherent space-time shift keying (CSTSK) based multiple-input multiple-output (MIMO) systems. In the proposed scheme, only a minimum number of space-time shift keying training blocks are employed for generating an initial least square channel estimate, which is then used for initial data detection. As usual, the detected soft information is first exchanged a number of times within the inner turbo loop between the unity-rate-code (URC) decoder and the CSTSK soft-demapper, and the information gleaned from the inner URC decoder is then iteratively exchanged with the outer decoder in the outer turbo loop. Our CE scheme is embedded into the outer turbo loop, which exploits the a posteriori information produced by the CSTSK soft-demapper to select a sufficient number of high-quality decisions only for CE. Since the CE is embedded into the iterative three-stage demapping-decoding process, no additional iterative loop is required for exchanging information between the decision-directed channel estimator and the three-stage turbo detector. Hence, the computational complexity of the proposed joint CE and three-stage turbo detection remains similar to that of the three-stage turbo detection-decoding scheme with the given channel estimate. Moreover, our proposed low-complexity semi-blind scheme is capable of approaching the optimal maximum likelihood turbo detection performance attained with the aid of perfect channel state information, with the same low number of turbo iterations as the latter, as confirmed by our extensive simulation results.