Graph-Aware Language Model Pre-Training on a Large Graph Corpus Can Help Multiple Graph Applications

被引:1
|
作者
Xie, Han [1 ]
Zheng, Da [2 ]
Ma, Jun [3 ]
Zhang, Houyu [4 ]
Ioannidis, Vassilis N. [5 ]
Song, Xiang [2 ]
Ping, Qing [6 ]
Wang, Sheng [7 ]
Yang, Carl [1 ]
Xu, Yi [4 ]
Zeng, Belinda [4 ]
Chilimbi, Trishul
机构
[1] Emory Univ, Atlanta, GA 30322 USA
[2] Amazon AWS AI, Santa Clara, CA USA
[3] Walgreens AI Lab, Bellevue, WA USA
[4] Amazon Search AI, Seattle, WA USA
[5] Amazon Search AI, Santa Clara, CA USA
[6] Amazon Search AI, Palo Alto, CA USA
[7] Amazon Scholar, Seattle, WA USA
关键词
Large Language Model; Pre-Training and Fine-Tuning; Graph Neural Network; Heterogeneous Graph;
D O I
10.1145/3580305.3599833
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Model pre-training on large text corpora has been demonstrated effective for various downstream applications in the NLP domain. In the graph mining domain, a similar analogy can be drawn for pre-training graph models on large graphs in the hope of benefiting downstream graph applications, which has also been explored by several recent studies. However, no existing study has ever investigated the pre-training of text plus graph models on large heterogeneous graphs with abundant textual information (a.k.a. large graph corpora) and then fine-tuning the model on different related downstream applications with different graph schemas. To address this problem, we propose a framework of graph-aware language model pre-training (GaLM) on a large graph corpus, which incorporates large language models and graph neural networks, and a variety of fine-tuning methods on downstream applications. We conduct extensive experiments on Amazon's real internal datasets and large public datasets. Comprehensive empirical results and in-depth analysis demonstrate the effectiveness of our proposed methods along with lessons learned.
引用
收藏
页码:5270 / 5281
页数:12
相关论文
共 50 条
  • [1] Graph Structure Enhanced Pre-Training Language Model for Knowledge Graph Completion
    Zhu, Huashi
    Xu, Dexuan
    Huang, Yu
    Jin, Zhi
    Ding, Weiping
    Tong, Jiahui
    Chong, Guoshuang
    [J]. IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE, 2024, 8 (04): : 2697 - 2708
  • [2] Contrastive Language-knowledge Graph Pre-training
    Yuan, Xiaowei
    Liu, Kang
    Wang, Yequan
    [J]. ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2024, 23 (04)
  • [3] Knowledge Graph Based Synthetic Corpus Generation for Knowledge-Enhanced Language Model Pre-training
    Agarwal, Oshin
    Ge, Heming
    Shakeri, Siamak
    Al-Rfou, Rami
    [J]. 2021 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL-HLT 2021), 2021, : 3554 - 3565
  • [4] Pre-training on Large-Scale Heterogeneous Graph
    Jiang, Xunqiang
    Jia, Tianrui
    Fang, Yuan
    Shi, Chuan
    Lin, Zhe
    Wang, Hui
    [J]. KDD '21: PROCEEDINGS OF THE 27TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY & DATA MINING, 2021, : 756 - 766
  • [5] MEGA: Meta-Graph Augmented Pre-Training Model for Knowledge Graph Completion
    Wang, Yashen
    Ouyang, Xiaoye
    Guo, Dayu
    Zhu, Xiaoling
    [J]. ACM TRANSACTIONS ON KNOWLEDGE DISCOVERY FROM DATA, 2024, 18 (01)
  • [6] JAKET: Joint Pre-training of Knowledge Graph and Language Understanding
    Yu, Donghan
    Zhu, Chenguang
    Yang, Yiming
    Zeng, Michael
    [J]. THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 11630 - 11638
  • [7] GCC: Graph Contrastive Coding for Graph Neural Network Pre-Training
    Qiu, Jiezhong
    Chen, Qibin
    Dong, Yuxiao
    Zhang, Jing
    Yang, Hongxia
    Ding, Ming
    Wang, Kuansan
    Tang, Jie
    [J]. KDD '20: PROCEEDINGS OF THE 26TH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY & DATA MINING, 2020, : 1150 - 1160
  • [8] A neighborhood-aware graph self-attention mechanism-based pre-training model for Knowledge Graph Reasoning
    Wu, Yuejia
    Zhou, Jian-tao
    [J]. INFORMATION SCIENCES, 2023, 647
  • [9] Cognize Yourself: Graph Pre-Training via Core Graph Cognizing and Differentiating
    Yu, Tao
    Fu, Yao
    Hu, Linghui
    Wang, Huizhao
    Jiang, Weihao
    Pu, Shiliang
    [J]. PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, CIKM 2022, 2022, : 2413 - 2422
  • [10] GPPT: Graph Pre-training and Prompt Tuning to Generalize Graph Neural Networks
    Sun, Mingchen
    Zhou, Kaixiong
    He, Xin
    Wang, Ying
    Wang, Xin
    [J]. PROCEEDINGS OF THE 28TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, KDD 2022, 2022, : 1717 - 1727