共 50 条
- [2] Trajectory-Based Off-Policy Deep Reinforcement Learning [J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 97, 2019, 97
- [3] Distributed Gradient Temporal Difference Off-policy Learning With Eligibility Traces: Weak Convergence [J]. IFAC PAPERSONLINE, 2020, 53 (02): : 1563 - 1568
- [4] VALUE-AWARE IMPORTANCE WEIGHTING FOR OFF-POLICY REINFORCEMENT LEARNING [J]. CONFERENCE ON LIFELONG LEARNING AGENTS, VOL 232, 2023, 232 : 745 - 763
- [5] Safe and efficient off-policy reinforcement learning [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 29 (NIPS 2016), 2016, 29
- [6] Bounds for Off-policy Prediction in Reinforcement Learning [J]. 2017 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2017, : 3991 - 3997
- [8] Off-Policy Reinforcement Learning with Delayed Rewards [J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
- [10] A perspective on off-policy evaluation in reinforcement learning [J]. Frontiers of Computer Science, 2019, 13 : 911 - 912