共 50 条
- [1] The best of both worlds: stochastic and adversarial episodic MDPs with unknown transition ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021,
- [2] Cooperative Online Learning in Stochastic and Adversarial MDPs INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
- [3] Simultaneously Learning Stochastic and Adversarial Bandits with General Graph Feedback INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
- [4] Simultaneously Learning Stochastic and Adversarial Bandits under the Position-Based Model THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / THE TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 6202 - 6210
- [5] Dynamic Regret of Adversarial MDPs with Unknown Transition and Linear Function Approximation THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 12, 2024, : 13572 - 13580
- [6] Online Learning with Off-Policy Feedback in Adversarial MDPs PROCEEDINGS OF THE THIRTY-THIRD INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2024, 2024, : 3697 - 3705
- [7] Episodic Reinforcement Learning in Finite MDPs: Minimax Lower Bounds Revisited ALGORITHMIC LEARNING THEORY, VOL 132, 2021, 132
- [8] Meta Learning MDPs with Linear Transition Models INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 151, 2022, 151 : 5928 - 5948
- [9] Improved Algorithm for Adversarial Linear Mixture MDPs with Bandit Feedback and Unknown Transition INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 238, 2024, 238
- [10] Beating Stochastic and Adversarial Semi-bandits Optimally and Simultaneously INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 97, 2019, 97