The emerging Internet of Things (IoT) makes users and things closely related together, and the interactions between users and things generate massive context data, where the preference information in time, space, and textual content is embedded. Traditional recommendation methods (e.g., movie, music, and location recommendations) are based on static intrinsic context information, which lacks consideration regarding real-time content and spatiotemporal features, failing to adapt to the personalized recommendation in IoT. Therefore, to meet users' interests and needs in IoT, a novel effective and efficient recommendation method is urgently needed. The paper focuses on mining users' things of interest in IoT via leveraging multidimensional context embedding. Specifically, to address the challenge from massive context data embedding different user preference information, the paper employs Convolutional Neural Networks (CNN) to mine the intrinsic content information of things and learn their represent. To solve the real-time recommendation problem, the paper proposes a real-time multimodal model embedded into location, time, and some instant content information to track the features of users and things. Furthermore, the paper proposes a matrix factorization -based framework using the regularization method to fuse real-time context embedding and intrinsic information embedding. The experimental results demonstrate the proposed method tailored to IoT is adaptable and flexible, and able to capture user personalized preference effectively.