SsciBERT: a pre-trained language model for social science texts

被引:0
|
作者
Si Shen
Jiangfeng Liu
Litao Lin
Ying Huang
Lin Zhang
Chang Liu
Yutong Feng
Dongbo Wang
机构
[1] Nanjing University of Science and Technology,Group of Science and Technology Full
[2] Nanjing Agricultural University,Text Knowledge Mining, School of Economics & Management
[3] Wuhan University,College of Information Management
[4] Wuhan University,Center for Science, Technology & Education Assessment (CSTEA)
来源
Scientometrics | 2023年 / 128卷
关键词
Social science; Natural language processing; Pre-trained models; Text analysis; BERT;
D O I
暂无
中图分类号
学科分类号
摘要
The academic literature of social sciences records human civilization and studies human social problems. With its large-scale growth, the ways to quickly find existing research on relevant issues have become an urgent demand for researchers. Previous studies, such as SciBERT, have shown that pre-training using domain-specific texts can improve the performance of natural language processing tasks. However, the pre-trained language model for social sciences is not available so far. In light of this, the present research proposes a pre-trained model based on the abstracts published in the Social Science Citation Index (SSCI) journals. The models, which are available on GitHub (https://github.com/S-T-Full-Text-Knowledge-Mining/SSCI-BERT), show excellent performance on discipline classification, abstract structure–function recognition, and named entity recognition tasks with the social sciences literature.
引用
收藏
页码:1241 / 1263
页数:22
相关论文
共 50 条
  • [41] Enriching Pre-trained Language Model with Entity Information for Relation Classification
    Wu, Shanchan
    He, Yifan
    [J]. PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON INFORMATION & KNOWLEDGE MANAGEMENT (CIKM '19), 2019, : 2361 - 2364
  • [42] AMBERT: A Pre-trained Language Model with Multi-Grained Tokenization
    Zhang, Xinsong
    Li, Pengshuai
    Li, Hang
    [J]. FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL-IJCNLP 2021, 2021, : 421 - 435
  • [43] Leveraging Pre-Trained Language Model for Summary Generation on Short Text
    Zhao, Shuai
    You, Fucheng
    Liu, Zeng Yuan
    [J]. IEEE ACCESS, 2020, 8 : 228798 - 228803
  • [44] Syntax-guided Contrastive Learning for Pre-trained Language Model
    Zhang, Shuai
    Wang, Lijie
    Xiao, Xinyan
    Wu, Hua
    [J]. FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), 2022, : 2430 - 2440
  • [45] LaoPLM: Pre-trained Language Models for Lao
    Lin, Nankai
    Fu, Yingwen
    Yang, Ziyu
    Chen, Chuwei
    Jiang, Shengyi
    [J]. LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 6506 - 6512
  • [46] Comparing Pre-Trained Language Model for Arabic Hate Speech Detection
    Daouadi, Kheir Eddine
    Boualleg, Yaakoub
    Guehairia, Oussama
    [J]. COMPUTACION Y SISTEMAS, 2024, 28 (02): : 681 - 693
  • [47] Detection of Chinese Deceptive Reviews Based on Pre-Trained Language Model
    Weng, Chia-Hsien
    Lin, Kuan-Cheng
    Ying, Jia-Ching
    [J]. APPLIED SCIENCES-BASEL, 2022, 12 (07):
  • [48] Pre-trained Language Model with Prompts for Temporal Knowledge Graph Completion
    Xu, Wenjie
    Liu, Ben
    Peng, Miao
    Jia, Xu
    Peng, Min
    [J]. arXiv, 2023,
  • [49] JiuZhang: A Chinese Pre-trained Language Model for Mathematical Problem Understanding
    Zhao, Wayne Xin
    Zhou, Kun
    Gong, Zheng
    Zhang, Beichen
    Zhou, Yuanhang
    Sha, Jing
    Chen, Zhigang
    Wang, Shijin
    Liu, Cong
    Wen, Ji-Rong
    [J]. PROCEEDINGS OF THE 28TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, KDD 2022, 2022, : 4571 - 4581
  • [50] Question-answering Forestry Pre-trained Language Model: ForestBERT
    Tan, Jingwei
    Zhang, Huaiqing
    Liu, Yang
    Yang, Jie
    Zheng, Dongping
    [J]. Linye Kexue/Scientia Silvae Sinicae, 2024, 60 (09): : 99 - 110