Robustness Assessment of Asynchronous Advantage Actor-Critic Based on Dynamic Skewness and Sparseness Computation: A Parallel Computing View

被引：0

作者：

Chen, Tong ^{[1
]}

Liu, Ji-Qiang ^{[1
]}

Li, He ^{[1
]}

Wang, Shuo-Ru ^{[1
]}

Niu, Wen-Jia ^{[1
]}

Tong, En-Dong ^{[1
]}

Chang, Liang ^{[2
]}

Chen, Qi Alfred ^{[3
]}

Li, Gang ^{[4
]}

机构：

[1] Beijing Jiaotong Univ, Beijing Key Lab Secur & Privacy Intelligent Trans, Beijing 100044, Peoples R China

[2] Guilin Univ Elect Technol, Guangxi Key Lab Trusted Software, Guilin 541004, Peoples R China

[3] Univ Calif Irvine, Donald Bren Sch Informat & Comp Sci, Irvine, CA 92697 USA

[4] Deakin Univ, Ctr Cyber Secur Res & Innovat, Geelong, Vic 3216, Australia

来源：

JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY | 2021年 / 36卷 / 05期

基金：

中国国家自然科学基金;

关键词：

robustness assessment; skewness; sparseness; asynchronous advantage actor-critic; reinforcement learning; NEURAL-NETWORKS;

D O I：

10.1007/s11390-021-1217-z

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

Reinforcement learning as autonomous learning is greatly driving artificial intelligence (AI) development to practical applications. Having demonstrated the potential to significantly improve synchronously parallel learning, the parallel computing based asynchronous advantage actor-critic (A3C) opens a new door for reinforcement learning. Unfortunately, the acceleration's inuence on A3C robustness has been largely overlooked. In this paper, we perform the first robustness assessment of A3C based on parallel computing. By perceiving the policy's action, we construct a global matrix of action probability deviation and define two novel measures of skewness and sparseness to form an integral robustness measure. Based on such static assessment, we then develop a dynamic robustness assessing algorithm through situational whole-space state sampling of changing episodes. Extensive experiments with different combinations of agent number and learning rate are implemented on an A3C-based pathfinding application, demonstrating that our proposed robustness assessment can effectively measure the robustness of A3C, which can achieve an accuracy of 83.3%.

引用

页码：1002 / 1021

页数：20

共 24 条

[1] Robustness Assessment of Asynchronous Advantage Actor-Critic Based on Dynamic Skewness and Sparseness Computation: A Parallel Computing View
Tong Chen
Ji-Qiang Liu
He Li
Shuo-Ru Wang
Wen-Jia Niu
En-Dong Tong
Liang Chang
Qi Alfred Chen
Gang Li
[J]. Journal of Computer Science and Technology, 2021, 36 : 1002 - 1021
[2] Dynamic service function chain placement in mobile computing: An asynchronous advantage actor-critic based approach
Jiang, Heling
Xia, Hai
Zare, Mansoureh
[J]. TRANSACTIONS ON EMERGING TELECOMMUNICATIONS TECHNOLOGIES, 2024, 35 (08):
[3] Adversarial retraining attack of asynchronous advantage actor-critic based pathfinding
Chen Tong
Liu Jiqiang
Xiang Yingxiao
Niu Wenjia
Tong Endong
Wang Shuoru
Li He
Chang Liang
Li Gang
Alfred, Chen Qi
[J]. INTERNATIONAL JOURNAL OF INTELLIGENT SYSTEMS, 2021, 36 (05) : 2323 - 2346
[4] Traffic signal control method based on asynchronous advantage actor-critic
Ye, Baolin
Sun, Ruitao
Wu, Weimin
Chen, Bin
Yao, Qing
[J]. Zhejiang Daxue Xuebao (Gongxue Ban)/Journal of Zhejiang University (Engineering Science), 2024, 58 (08): : 1671 - 1680
[5] Optimization of Robot Environment Interaction Based on Asynchronous Advantage Actor-Critic Algorithm
Xu, Jitang
Chen, Qiang
[J]. INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2024, 15 (06) : 1350 - 1359
[6] VMP-A3C: Virtual machines placement in cloud computing based on asynchronous advantage actor-critic algorithm
Wei, Pengcheng
Zeng, Yushan
Yan, Bei
Zhou, Jiahui
Nikougoftar, Elaheh
[J]. JOURNAL OF KING SAUD UNIVERSITY-COMPUTER AND INFORMATION SCIENCES, 2023, 35 (05)
[7] Workflow scheduling based on asynchronous advantage actor-critic algorithm in multi-cloud environment
Tang, Xuhao
Liu, Fagui
Wang, Bin
Xu, Dishi
Jiang, Jun
Wu, Qingbo
Chen, C. L. Philip
[J]. EXPERT SYSTEMS WITH APPLICATIONS, 2024, 258
[8] Resource allocation Algorithm of Service Function Chain Based on Asynchronous Advantage Actor-Critic Learning
Tang Lun
He Xiaoyu
Wang Xiao
Tan Qi
Hu Yanjuan
Chen Qianbin
[J]. JOURNAL OF ELECTRONICS & INFORMATION TECHNOLOGY, 2021, 43 (06) : 1733 - 1741
[9] Design and application of adaptive PID controller based on asynchronous advantage actor-critic learning method
Sun, Qifeng
Du, Chengze
Duan, Youxiang
Ren, Hui
Li, Hongqiang
[J]. WIRELESS NETWORKS, 2021, 27 (05) : 3537 - 3547
[10] A new noise network and gradient parallelisation-based asynchronous advantage actor-critic algorithm
Fei, Zhengshun
Wang, Yanping
Wang, Jinglong
Liu, Kangling
Huang, Bingqiang
Tan, Ping
[J]. IET CYBER-SYSTEMS AND ROBOTICS, 2022, 4 (03) : 175 - 188

← 1 2 3 →