共 50 条
- [41] Research on Off-Policy Evaluation in Reinforcement Learning: A Survey Jisuanji Xuebao/Chinese Journal of Computers, 2022, 45 (09): : 1926 - 1945
- [43] A Temporal-Difference Approach to Policy Gradient Estimation INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
- [44] Data-Efficient Off-Policy Policy Evaluation for Reinforcement Learning INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 48, 2016, 48
- [45] Universal Off-Policy Evaluation ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021,
- [46] Fast Link Scheduling in Wireless Networks Using Regularized Off-Policy Reinforcement Learning IEEE Networking Letters, 2023, 5 (02): : 86 - 90
- [47] Asymptotically Efficient Off-Policy Evaluation for Tabular Reinforcement Learning INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 108, 2020, 108
- [48] Off-policy evaluation for tabular reinforcement learning with synthetic trajectories Statistics and Computing, 2024, 34
- [49] Doubly Robust Distributionally Robust Off-Policy Evaluation and Learning INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022, : 10598 - 10632
- [50] Effective Off-Policy Evaluation and Learning in Contextual Combinatorial Bandits PROCEEDINGS OF THE EIGHTEENTH ACM CONFERENCE ON RECOMMENDER SYSTEMS, RECSYS 2024, 2024, : 733 - 741