Aligning Large Language Models via Fine-grained Supervision

被引:0
|
作者
Liang, Dehong [1 ,2 ]
Qiu, Liang [2 ]
Kim, Minseok [2 ]
Ladhak, Faisal [2 ]
Do, Jaeyoung [3 ]
机构
[1] UCLA, Dept Stat, Los Angeles, CA 90095 USA
[2] Amazon, Palo Alto, CA 94303 USA
[3] Seoul Natl Univ, Dept Elect & Comp Engn, Seoul, South Korea
关键词
D O I
暂无
中图分类号
学科分类号
摘要
Pre-trained large-scale language models (LLMs) excel at producing coherent articles, yet their outputs may be untruthful, toxic, or fail to align with user expectations. Current approaches focus on using reinforcement learning with human feedback (RLHF) to improve model alignment, which works by transforming coarse human preferences of LLM outputs into a feedback signal that guides the model learning process. However, because this approach operates on sequence-level feedback, it lacks the precision to identify the exact parts of the output affecting user preferences. To address this gap, we propose a method to enhance LLM alignment through fine-grained token-level supervision. Specifically, we ask annotators to minimally edit less preferred responses within the standard reward modeling dataset to make them more favorable, ensuring changes are made only where necessary while retaining most of the original content. The refined dataset is used to train a token-level reward model, which is then used for training our fine-grained Proximal Policy Optimization (PPO) model. Our experiment results demonstrate that this approach can achieve up to an absolute improvement of 5.1% in LLM performance, in terms of win rate against the reference model, compared with the traditional PPO model.
引用
收藏
页码:673 / 680
页数:8
相关论文
共 50 条
  • [1] Fine-grained detoxification framework via instance-level prefixes for large language models
    Yi, Xin
    Wang, Linlin
    Wang, Xiaoling
    He, Liang
    NEUROCOMPUTING, 2025, 611
  • [2] Beyond Binary Classification: A Fine-Grained Safety Dataset for Large Language Models
    Yu, Jia
    Li, Long
    Lan, Zhenzhong
    IEEE ACCESS, 2024, 12 : 64717 - 64726
  • [3] Fine-grained Affective Processing Capabilities Emerging from Large Language Models
    Broekens, Joost
    Hilpert, Bernhard
    Verberne, Suzan
    Baraka, Kim
    Gebhard, Patrick
    Plaat, Aske
    2023 11TH INTERNATIONAL CONFERENCE ON AFFECTIVE COMPUTING AND INTELLIGENT INTERACTION, ACII, 2023,
  • [4] Fine-Grained Task Planning for Service Robots Based on Object Ontology Knowledge via Large Language Models
    Li, Xiaodong
    Tian, Guohui
    Cui, Yongcheng
    IEEE ROBOTICS AND AUTOMATION LETTERS, 2024, 9 (08): : 6872 - 6879
  • [5] fine-grained comparison of pragmatic language understanding in humans and language models
    Hu, Jennifer
    Floyd, Sammy
    Jouravlev, Olessia
    Fedorenko, Evelina
    Gibson, Edward
    PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, VOL 1, 2023, : 4194 - 4213
  • [6] Densifying Supervision for Fine-Grained Visual Comparisons
    Aron Yu
    Kristen Grauman
    International Journal of Computer Vision, 2020, 128 : 2704 - 2730
  • [7] Densifying Supervision for Fine-Grained Visual Comparisons
    Yu, Aron
    Grauman, Kristen
    INTERNATIONAL JOURNAL OF COMPUTER VISION, 2020, 128 (10-11) : 2704 - 2730
  • [8] Fine-grained Image Classification via Combining Vision and Language
    He, Xiangteng
    Peng, Yuxin
    30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 7332 - 7340
  • [9] ViGoR: Improving Visual Grounding of Large Vision Language Models with Fine-Grained Reward Modeling
    Yan, Siming
    Bai, Min
    Chen, Weifeng
    Zhou, Xiong
    Huang, Qixing
    Liz, Li Erran
    COMPUTER VISION - ECCV 2024, PT LXI, 2025, 15119 : 37 - 53
  • [10] CafeLLM: Context-Aware Fine-Grained Semantic Clustering Using Large Language Models
    Huang, Ryan Yuki
    Small, Colin Robert
    GENERALIZING FROM LIMITED RESOURCES IN THE OPEN WORLD, GLOW-IJCAI 2024, 2024, 2160 : 66 - 81