共 50 条
- [31] Off-policy evaluation for tabular reinforcement learning with synthetic trajectories Statistics and Computing, 2024, 34
- [32] Regret Minimization Experience Replay in Off-Policy Reinforcement Learning ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
- [33] Regularized Anderson Acceleration for Off-Policy Deep Reinforcement Learning ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
- [34] Stabilizing Off-Policy Deep Reinforcement Learning from Pixels INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
- [35] Double Reinforcement Learning for Efficient and Robust Off-Policy Evaluation INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 119, 2020, 119
- [36] Trajectory-Based Off-Policy Deep Reinforcement Learning INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 97, 2019, 97
- [39] Enhanced Off-Policy Reinforcement Learning With Focused Experience Replay IEEE ACCESS, 2021, 9 (09): : 93152 - 93164
- [40] Doubly Robust Off-policy Value Evaluation for Reinforcement Learning INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 48, 2016, 48