Policy evaluation (PE) is a critical sub-problem in reinforcement learning, which estimates the value function for a given policy and can be used for policy improvement. However, there still exist some limitations in current PE methods, such as low sample efficiency and local convergence, especially on complex tasks. In this study, a novel PE algorithm called Least-Squares Truncated Temporal-Difference learning ((LSTD)-D-2) is proposed. In (LSTD)-D-2, an adaptive truncation mechanism is designed, which effectively takes advantage of the fast convergence property of Least-Squares Temporal Difference learning and the asymptotic convergence property of Temporal Difference learning (TD). Then, two feature pre-training methods are utilised to improve the approximation ability of (LSTD)-D-2. Furthermore, an Actor-Critic algorithm based on (LSTD)-D-2 and pre-trained feature representations (ACLPF) is proposed, where (LSTD)-D-2 is integrated into the critic network to improve learning-prediction efficiency. Comprehensive simulation studies were conducted on four robotic tasks, and the corresponding results illustrate the effectiveness of (LSTD)-D-2. The proposed ACLPF algorithm outperformed DQN, ACER and PPO in terms of sample efficiency and stability, which demonstrated that (LSTD)-D-2 can be applied to online learning control problems by incorporating it into the actor-critic architecture.