Byzantine-Resilient Decentralized Policy Evaluation With Linear Function Approximation

被引:11
|
作者
Wu, Zhaoxian [1 ,2 ,3 ]
Shen, Han [4 ]
Chen, Tianyi [4 ]
Ling, Qing [1 ,2 ,3 ]
机构
[1] Sun Yat Sen Univ, Sch Comp Sci & Engn, Guangzhou 510006, Guangdong, Peoples R China
[2] Sun Yat Sen Univ, Guangdong Prov Key Lab Computat Sci, Guangzhou 510006, Guangdong, Peoples R China
[3] Pazhou Lab, Guangzhou 510300, Guangdong, Peoples R China
[4] Rensselaer Polytech Inst, Dept Elect Comp & Syst Engn, Troy, NY 12180 USA
基金
中国国家自然科学基金;
关键词
Signal processing algorithms; Convergence; Function approximation; Optimization; Approximation algorithms; Task analysis; Mathematical model; Policy evaluation; multi-agent reinforcement learning; temporal-difference learning; Byzantine attacks; DISTRIBUTED OPTIMIZATION; CONVERGENCE; ALGORITHMS;
D O I
10.1109/TSP.2021.3090952
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
In this paper, we consider the policy evaluation problem in reinforcement learning with agents on a decentralized and directed network. In order to evaluate the quality of a fixed policy in this decentralized setting, one option is for agents to run decentralized temporal-difference (TD) collaboratively. To account for the practical scenarios where the state and action spaces are large and malicious attacks emerge, we focus on the decentralized TD learning with linear function approximation in the presence of malicious agents (often termed as Byzantine agents). We propose a trimmed mean-based Byzantine-resilient decentralized TD algorithm to perform policy evaluation in this setting. We establish the finite-time convergence rate, as well as the asymptotic learning error that depends on the number of Byzantine agents. Numerical experiments corroborate the robustness of the proposed algorithm.
引用
收藏
页码:3839 / 3853
页数:15
相关论文
共 50 条
  • [1] Byzantine-Resilient Decentralized Policy Evaluation with Linear Function Approximation
    Wu, Zhaoxian
    Shen, Han
    Chen, Tianyi
    Ling, Qing
    Ling, Qing (lingqing556@mail.sysu.edu.cn), 1600, Institute of Electrical and Electronics Engineers Inc. (69): : 3839 - 3853
  • [2] BYZANTINE-RESILIENT DECENTRALIZED TD LEARNING WITH LINEAR FUNCTION APPROXIMATION
    Wu, Zhaoxian
    Shen, Han
    Chen, Tianyi
    Ling, Qing
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 5040 - 5044
  • [3] BYZANTINE-RESILIENT DECENTRALIZED COLLABORATIVE LEARNING
    Xu, Jian
    Huang, Shao-Lun
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 5253 - 5257
  • [4] BYZANTINE-RESILIENT DECENTRALIZED RESOURCE ALLOCATION
    Wang, Runhua
    Liu, Yaohua
    Ling, Qing
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 5293 - 5297
  • [5] Byzantine-resilient decentralized network learning
    Yang, Yaohong
    Wang, Lei
    JOURNAL OF THE KOREAN STATISTICAL SOCIETY, 2024, 53 (02) : 349 - 380
  • [6] BRIDGE: Byzantine-Resilient Decentralized Gradient Descent
    Fang, Cheng
    Yang, Zhixiong
    Bajwa, Waheed U.
    IEEE TRANSACTIONS ON SIGNAL AND INFORMATION PROCESSING OVER NETWORKS, 2022, 8 : 610 - 626
  • [7] Byzantine-Resilient Decentralized Stochastic Gradient Descent
    Guo, Shangwei
    Zhang, Tianwei
    Yu, Han
    Xie, Xiaofei
    Ma, Lei
    Xiang, Tao
    Liu, Yang
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2022, 32 (06) : 4096 - 4106
  • [8] Byzantine-Resilient Resource Allocation Over Decentralized Networks
    Wang, Runhua
    Liu, Yaohua
    Ling, Qing
    IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2022, 70 : 4711 - 4726
  • [9] Basil: A Fast and Byzantine-Resilient Approach for Decentralized Training
    Elkordy, Ahmed Roushdy
    Prakash, Saurav
    Avestimehr, Salman
    IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS, 2022, 40 (09) : 2694 - 2716
  • [10] ByRDiE: Byzantine-Resilient Distributed Coordinate Descent for Decentralized Learning
    Yang, Zhixiong
    Bajwa, Waheed U.
    IEEE TRANSACTIONS ON SIGNAL AND INFORMATION PROCESSING OVER NETWORKS, 2019, 5 (04): : 611 - 627