In order to achieve economic operation of the microgrid (MG), energy management problem (EMP) has attracted attention from scholars worldwide. In order to overcome the lack of flexibility when coping with uncertainties and topology changes, a multi-agent based proximal policy optimization algorithm (MAPPO) is proposed in this paper. Different from the offline training and online implementing mode, the proposed decentralized MAPPO algorithm has the characteristic of online training and online application, which can get higher optimization efficiency and lower communication burden. Taking into account users’ satisfaction, renewable energy utilization rate and operating costs, an optimization model is established. Aiming at the difficulty on satisfying the power balance constraint in EMPU using reinforcement learning (RL), a novel power imbalance penalty is designed. Compared with the traditional penalty function, the proposed penalty function can effectively avoid the phenomenon of power imbalance. Finally, 24-hour energy management results are provided to verify the effectiveness of the proposed algorithm. Moreover, the proposed MAPPO is compared with several popular multi-agent based RL algorithms. Simulation results show that the proposed algorithm has higher efficiency and can obtain better energy management strategies.