Semantic segmentation is one of the most fundamental techniques for visual intelligence, which plays a vital role for indoor service robotic tasks such as scene understanding, autonomous navigation and dexterous manipulation. However, semantic segmentation of indoor environments poses great challenges for existing segmentation techniques due to the complex overlaps, heavy occlusions and cluttered scenes with objects of different shapes and scales, which may lead to the loss of edge information and insufficient segmentation accuracy. And most of the semantic segmentation networks are very complex and cannot be applied to mobile robot platforms. Thus, it is of significant importance for ensuring as few network parameters as possible while improving the detection of meaningful edges in indoor scenes. In this paper, we present a novel indoor scene semantic segmentation method that can refine the segmentation edges and achieve a balance between accuracy and model complexity for indoor service robots. Our approach systematically incorporates dilated convolution and rich convolutional features from the intermediate layers of Convolutional Neural Networks (CNN), which is based on two motivations: (1) The middle hidden layer of CNN contains a lot of potentially useful information for better edge detection which is, however, no longer present in latter layers in traditional structures. (2) The dilated convolution can change the size of receptive field and obtain multi-scale feature information without losing the resolution and introducing any additional parameters. Thus we propose a new end-to-end Multi-Scale Convolutional Features (MSCF) network to integrate the dilated convolution and rich convolutional features extracted from the intermediate layers of traditional CNN. Finally, the resulting approach is extensively evaluated on the prestigious indoor image datasets of SUN RGB-D and NYUDv2, and shows promising improvements over state-of-the-art baselines, both qualitatively and quantitatively.