Nested-Wasserstein Self-Imitation Learning for Sequence Generation

被引:0
|
作者
Zhang, Ruiyi [1 ]
Chen, Changyou [2 ]
Gan, Zhe [3 ]
Wen, Zheng [4 ]
Wang, Wenlin [1 ]
Carin, Lawrence [1 ]
机构
[1] Duke Univ, Durham, NC 27706 USA
[2] SUNY Buffalo, Buffalo, NY USA
[3] Microsoft Dynam 365 AI Res, Redmond, WA USA
[4] DeepMind, London, England
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Reinforcement learning (RL) has been widely studied for improving sequence-generation models. However, the conventional rewards used for RL training typically cannot capture sufficient semantic information and therefore manifest model bias. Further, the sparse and delayed rewards make RL exploration inefficient. To alleviate these issues, we propose the concept of nested-Wasserstein distance for distributional semantic matching. To further exploit it, a novel nested-Wasserstein self-imitation learning framework is developed, encouraging the model to exploit historical high-reward sequences for enhanced exploration and better semantic matching. Our solution can be understood as approximately executing proximal policy optimization with Wasserstein trust-regions. Experiments on a variety of unconditional and conditional sequence-generation tasks demonstrate the proposed approach consistently leads to improved performance.
引用
下载
收藏
页码:422 / 432
页数:11
相关论文
共 50 条
  • [1] Self-Imitation Learning
    Oh, Junhyuk
    Guo, Yijie
    Singh, Satinder
    Lee, Honglak
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 80, 2018, 80
  • [2] Self-Imitation Learning by Planning
    Luo, Sha
    Kasaei, Hamidreza
    Schomaker, Lambert
    2021 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA 2021), 2021, : 4823 - 4829
  • [3] Episodic Self-Imitation Learning with Hindsight
    Dai, Tianhong
    Liu, Hengyan
    Bharath, Anil Anthony
    ELECTRONICS, 2020, 9 (10) : 1 - 18
  • [4] Self-imitation Learning for Action Generation in Text-based Games
    Shi, Zijing
    Xu, Yunqiu
    Fang, Meng
    Chen, Ling
    17TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EACL 2023, 2023, : 703 - 726
  • [5] Balancing Exploration and Exploitation in Self-imitation Learning
    Kang, Chun-Yao
    Chen, Ming-Syan
    ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PAKDD 2020, PT II, 2020, 12085 : 274 - 285
  • [6] 'Self-Imitation of Myself'
    Tayler, C
    TLS-THE TIMES LITERARY SUPPLEMENT, 1998, (4958): : 22 - 22
  • [7] Visual Hindsight Self-Imitation Learning for Interactive Navigation
    Kim, Kibeom
    Lee, Moonhoen
    Whoo Lee, Min
    Shin, Kisung
    Lee, Minsu
    Zhang, Byoung-Tak
    IEEE ACCESS, 2024, 12 : 83796 - 83809
  • [8] Self-imitation guided goal-conditioned reinforcement learning
    Li, Yao
    Wang, Yuhui
    Tan, Xiaoyang
    PATTERN RECOGNITION, 2023, 144
  • [9] Learning Robotic Skills via Self-Imitation and Guide Reward
    Ran, Chenyang
    Su, Jianbo
    2021 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS (SMC), 2021, : 2158 - 2163
  • [10] Humanoid Behaviour Learning through Visuomotor Association by Self-Imitation
    Dawood, Farhan
    Loo, Chu Kiong
    2014 JOINT 7TH INTERNATIONAL CONFERENCE ON SOFT COMPUTING AND INTELLIGENT SYSTEMS (SCIS) AND 15TH INTERNATIONAL SYMPOSIUM ON ADVANCED INTELLIGENT SYSTEMS (ISIS), 2014, : 922 - 929