共 50 条
- [21] Online Attentive Kernel-Based Off-Policy Temporal Difference Learning APPLIED SCIENCES-BASEL, 2024, 14 (23):
- [23] Regularized Anderson Acceleration for Off-Policy Deep Reinforcement Learning ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
- [24] Distributed Off-Policy Temporal Difference Learning Using Primal-Dual Method IEEE ACCESS, 2022, 10 : 107077 - 107094
- [25] Distributed Gradient Temporal Difference Off-policy Learning With Eligibility Traces: Weak Convergence IFAC PAPERSONLINE, 2020, 53 (02): : 1563 - 1568
- [26] Off-Policy Evaluation via Off-Policy Classification ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
- [27] Adaptive Temporal-Difference Learning for Policy Evaluation with Per-State Uncertainty Estimates ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
- [28] Learning Action Embeddings for Off-Policy Evaluation ADVANCES IN INFORMATION RETRIEVAL, ECIR 2024, PT I, 2024, 14608 : 108 - 122
- [30] A perspective on off-policy evaluation in reinforcement learning Frontiers of Computer Science, 2019, 13 : 911 - 912