3D object detection plays an important role in autonomous driving. Different from the early stage methods that only use single-modality data, multi-modality based, mostly LiDAR-camera based detectors, have been widely studied and proposed in recent years. All fusion methods can be divided into three categories, i.e. early fusion that fuses raw data from different modalities, middle fusion that intermediately fuses the extracted multi-modal features by feature alignment, and late fusion that conducts fusion on instance level after detectors of different modality individually give their predictions. Current methods mainly focus on early and middle fusion, neglecting the huge potential of late fusion. This paper reveals the feasibility of improving the performance of 3D object detector by applying our proposed semantic consistency filter (SCF), a plug-and-play late fusion strategy, to any existing models. SCF works by removing incorrect prediction boxes by computing semantic inconsistency rate (SIR) with their corresponding 2D semantic mask generated by semantic segmentation network. Abundant experiments conducted with several different baselines prove the effectiveness and versatility of SCF, indicating that late fusion might be the key to improve the performance of 3D object detectors.