共 22 条
- [11] Watkins C.J.C.H., Dayan P., Technical note: Q-learning, Machine Learning, 8, 3-4, pp. 279-292, (1992)
- [12] Zou B., Zhang H., Xu Z., Learning from uniformly ergodic Markov chains, Journal of Complexity, 25, 2, pp. 188-200, (2009)
- [13] Tsitsiklis J.N., Van Roy B., An analysis of temporal- difference learning with function approximation, IEEE Transactions on Automatic Control, 42, 5, pp. 674-690, (1997)
- [14] Liu W., Zhuang P., Liang H., Et al., Distributed economic dispatch in microgrids based on cooperative reinforcement learning, IEEE Transactions on Neural Networks and Learning Systems, 29, 6, pp. 2192-2203, (2018)
- [15] Wei Q., Liu D., Shi G., A novel dual iterative Q-learning method for optimal battery management in smart residential environments, IEEE Transactions on Industrial Electronics, 62, 4, pp. 2509-2518, (2015)
- [16] Mnih V., Kavukcuoglu K., Silver D., Et al., Playing Atari with deep reinforcement learning, (2013)
- [17] Silver D., Lever G., Heess N., Et al., Deterministic policy gradient algorithms, Proceedings of the 31st International Conference on Machine Learning, (2014)
- [18] Bianchi R.A.C., Ribeiro C.H.C., Costa A.H.R., Accelerating autonomous learning by using heuristic selection of actions, Journal of Heuristics, 14, 2, pp. 135-168, (2008)
- [19] Ziebart B.D., Maas A., Bagnell J.A., Et al., Maximum entropy inverse reinforcement learning, Proceedings of the 23rd National Conference on Artificial intelligence, pp. 1433-1438, (2008)
- [20] Li C., Cao L., Zhang Y., Et al., knowledge-based deep reinforcement learning: a review, Systems Engineering and Electronics, 39, 11, pp. 2603-2613, (2017)