In order to improve data availability in field of distribution network planning and intelligence analysis with reduced data cache cost, effectively analyze large-scale, mixed and inaccurately monitored or collected load data online, and to ensure consistent deviation detection and accurate repair for time series data in each cycle, an online data cleaning and repair method for large-scale distribution network load data is proposed based on analysis of different types of abnormal load causes and distribution features, including abnormal load steam identification method on density and data repair method on collaborative filtering recommendation algorithm. To break through bottlenecks in online data analysis performance for distribution network load, parallel solution on Hadoop platform is given. Verified with actual distribution network operation data, result shows that the proposed algorithm and frame could get effective data preprocessing and yield favorable significance in practice and research. ©, 2015, Power System Technology Press. All right reserved.