Near-optimal Deep Reinforcement Learning Policies from Data for Zone Temperature Control

被引:1
|
作者
Di Natale, Loris [1 ,2 ]
Svetozarevic, Bratislav [1 ]
Heer, Philipp [1 ]
Jones, Colin N. [2 ]
机构
[1] Urban Energy Syst Lab, Swiss Fed Labs Mat Sci & Technol Empa, CH-8600 Dubendorf, Switzerland
[2] Swiss Fed Inst Technol Lausanne EPFL, Lab Automat, CH-1015 Lausanne, Switzerland
基金
瑞士国家科学基金会;
关键词
D O I
10.1109/ICCA54724.2022.9831914
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Replacing poorly performing existing controllers with smarter solutions will decrease the energy intensity of the building sector. Recently, controllers based on Deep Reinforcement Learning (DRL) have been shown to be more effective than conventional baselines. However, since the optimal solution is usually unknown, it is still unclear if DRL agents are attaining near-optimal performance in general or if there is still a large gap to bridge. In this paper, we investigate the performance of DRL agents compared to the theoretically optimal solution. To that end, we leverage Physically Consistent Neural Networks (PCNNs) as simulation environments, for which optimal control inputs are easy to compute. Furthermore, PCNNs solely rely on data to be trained, avoiding the difficult physics-based modeling phase, while retaining physical consistency. Our results hint that DRL agents not only clearly outperform conventional rule-based controllers, they furthermore attain near-optimal performance.
引用
收藏
页码:698 / 703
页数:6
相关论文
共 50 条
  • [1] Polynomial-time reinforcement learning of near-optimal policies
    Pivazyan, K
    Shoham, Y
    EIGHTEENTH NATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE (AAAI-02)/FOURTEENTH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE (IAAI-02), PROCEEDINGS, 2002, : 205 - 210
  • [2] A Bayesian reinforcement learning approach in markov games for computing near-optimal policies
    Julio B. Clempner
    Annals of Mathematics and Artificial Intelligence, 2023, 91 : 675 - 690
  • [3] A Bayesian reinforcement learning approach in markov games for computing near-optimal policies
    Clempner, Julio B.
    ANNALS OF MATHEMATICS AND ARTIFICIAL INTELLIGENCE, 2023, 91 (05) : 675 - 690
  • [4] Near-Optimal Reinforcement Learning in Polynomial Time
    Michael Kearns
    Satinder Singh
    Machine Learning, 2002, 49 : 209 - 232
  • [5] Near-optimal reinforcement learning in polynomial time
    Kearns, M
    Singh, S
    MACHINE LEARNING, 2002, 49 (2-3) : 209 - 232
  • [6] Near-optimal Regret Bounds for Reinforcement Learning
    Jaksch, Thomas
    Ortner, Ronald
    Auer, Peter
    JOURNAL OF MACHINE LEARNING RESEARCH, 2010, 11 : 1563 - 1600
  • [7] Near-optimal regret bounds for reinforcement learning
    Jaksch, Thomas
    Ortner, Ronald
    Auer, Peter
    Journal of Machine Learning Research, 2010, 11 : 1563 - 1600
  • [8] Near-optimal Reinforcement Learning in Factored MDPs
    Osband, Ian
    Van Roy, Benjamin
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 27 (NIPS 2014), 2014, 27
  • [9] Controller exploitation-exploration reinforcement learning architecture for computing near-optimal policies
    Asiain, Erick
    Clempner, Julio B.
    Poznyak, Alexander S.
    SOFT COMPUTING, 2019, 23 (11) : 3591 - 3604
  • [10] Controller exploitation-exploration reinforcement learning architecture for computing near-optimal policies
    Erick Asiain
    Julio B. Clempner
    Alexander S. Poznyak
    Soft Computing, 2019, 23 : 3591 - 3604