Deep reinforcement learning with robust augmented reward sequence prediction for improving GNSS positioning

被引：0

作者：

Tang, Jianhao ^{[1
,2
]}

Li, Zhenni ^{[1
,3
]}

Yu, Qingsong ^{[4
]}

Zhao, Haoli ^{[5
]}

Zeng, Kungan ^{[6
]}

Zhong, Shiguang ^{[4
]}

Wang, Qianming ^{[1
,7
]}

Xie, Kan ^{[1
,2
]}

Kuzin, Victor ^{[8
]}

Xie, Shengli ^{[2
,9
]}

机构：

[1] Guangdong Univ Technol, Sch Automat, Guangzhou 510006, Peoples R China

[2] Guangdong Key Lab IoT Informat Technol, Guangzhou 510006, Peoples R China

[3] Guangdong HongKong Macao Joint Lab Smart Discrete, Guangzhou 510006, Guangdong, Peoples R China

[4] Guangzhou Haige Commun Grp Inc Co, Guangzhou 510006, Peoples R China

[5] Hebei Univ Technol, Sch Artificial Intelligence, Tianjin 300401, Peoples R China

[6] Sun Yat Sen Univ, Sch Software Engn, Zhuhai 519000, Peoples R China

[7] Taidou Microelect Technol Co Ltd, Guangzhou 510006, Peoples R China

[8] Acad Russian Engn Acad, Moscow, Russia

[9] Key Lab Intelligent Detect & Internet Things Mfg G, Guangzhou 510006, Peoples R China

来源：

GPS SOLUTIONS | 2025年 / 29卷 / 02期

基金：

中国国家自然科学基金;

关键词：

Deep reinforcement learning; Robust augmented reward sequence prediction; Generalization; GNSS positioning;

D O I：

10.1007/s10291-025-01824-w

中图分类号：

TP7 [遥感技术];

学科分类号：

081102 ; 0816 ; 081602 ; 083002 ; 1404 ;

摘要：

Data-driven technologies have shown promising potential for improving GNSS positioning, which can analyze observation data to learn the complex hidden characteristics of system models, without rigorous prior assumptions. However, in complex urban areas, the input observation data contain task-irrelevant noisy GNSS measurements arising from stochastic noise, such as signal reflections from tall buildings. Moreover, the problem of data distribution shift between the training and testing phases exists for dynamically changing environments. These problems limit the robustness and generalizability of the data-driven GNSS positioning methods in urban areas. In this paper, a novel deep reinforcement learning (DRL) method is proposed to improve the robustness and generalizability of the data-driven GNSS positioning. Specifically, to address the data distribution shift in dynamically changing environments, the robust Bellman operator (RBO) is employed into the DRL optimization to model the deviations in the data distribution and to enhance generalizability. To improve robustness against task-irrelevant noisy GNSS measurements, the long-term reward sequence prediction (LRSP) is adopted to learn robust representations by extracting task-relevant information from GNSS observations. Therefore, we develop a DRL method with robust augmented reward sequence prediction to correct the rough position solved by model-based methods. Moreover, a novel real-world GNSS positioning dataset is built, containing different scenes in urban areas. Our experiments were conducted on the public dataset Google smartphone decimeter challenge 2022 (GSDC2022) and the built dataset Guangzhou GNSS version 2 (GZGNSS-V2), which demonstrated that the proposed method can outperform model-based and state-of-the-art data-driven methods in terms of generalizability across different environments.

引用

页数：28

共 50 条

[31] Generalization in Deep Reinforcement Learning for Robotic Navigation by Reward Shaping
Miranda, Victor R. F.
Neto, Armando A.
Freitas, Gustavo M.
Mozelli, Leonardo A.
IEEE TRANSACTIONS ON INDUSTRIAL ELECTRONICS, 2024, 71 (06) : 6013 - 6020
[32] Improving Deep Reinforcement Learning via Transfer
Du, Yunshu
AAMAS '19: PROCEEDINGS OF THE 18TH INTERNATIONAL CONFERENCE ON AUTONOMOUS AGENTS AND MULTIAGENT SYSTEMS, 2019, : 2405 - 2407
[33] Dopamine encodes a quantiative reward prediction error for for reinforcement learning
Glimcher, PW
Mullette-Gillman, OA
Bayer, HM
Lau, B
Rutledge, R
NEUROPSYCHOPHARMACOLOGY, 2005, 30 : S27 - S27
[34] Learning positioning policies for mobile manipulation operations with deep reinforcement learning
Iriondo, Ander
Lazkano, Elena
Ansuategi, Ander
Rivera, Andoni
Lluvia, Iker
Tubio, Carlos
INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2023, 14 (09) : 3003 - 3023
[35] Learning positioning policies for mobile manipulation operations with deep reinforcement learning
Ander Iriondo
Elena Lazkano
Ander Ansuategi
Andoni Rivera
Iker Lluvia
Carlos Tubío
International Journal of Machine Learning and Cybernetics, 2023, 14 : 3003 - 3023
[36] Inferring reward prediction errors in patients with schizophrenia: a dynamic reward task for reinforcement learning
Li, Chia-Tzu
Lai, Wen-Sung
Liu, Chih-Min
Hsu, Yung-Fong
FRONTIERS IN PSYCHOLOGY, 2014, 5
[37] Robust Reinforcement Learning via Progressive Task Sequence
Li, Yike
Tian, Yunzhe
Tong, Endong
Niu, Wenjia
Liu, Jiqiang
PROCEEDINGS OF THE THIRTY-SECOND INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2023, 2023, : 455 - 463
[38] Robust Reward-Free ActorCritic for Cooperative Multiagent Reinforcement Learning
Lin, Qifeng
Ling, Qing
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024, 35 (12) : 17318 - 17329
[39] Reward-based participant selection for improving federated reinforcement learning
Lee, Woonghee
ICT EXPRESS, 2023, 9 (05): : 803 - 808
[40] Diversity-augmented intrinsic motivation for deep reinforcement learning
Dai, Tianhong
Du, Yali
Fang, Meng
Bharath, Anil Anthony
NEUROCOMPUTING, 2022, 468 : 396 - 406

← 1 2 3 4 5 →